商汤开源 SenseNova U1 完整训练代码,提供可检查、可修改、可重建的完整训练栈。同步发布 smoke-test 数据集,覆盖 t2i、it2i、多图输入、交错生成、多模态理解、视频理解、纯语言续写 7 种任务类型。用户可基于该 schema 用自有数据微调 U1,或验证数据格式及端到端测试 pipeline。数据集已上架 HuggingFace,代码托管于 GitHub。
SenseNova U1 training code is open-sourced - full training stack, inspectable, modifiable, rebuildable.
Also released: a smoke-test dataset spanning all 7 task types - t2i · it2i · it2i (multi-img) · interleave_gen · multimodal understanding · video understanding · pure language continuation
Use it to: 🔹Bring your own data in this schema to fine-tune U1 into a specialist 🔹Validate your data against the official schema 🔹Smoke-test your pipeline end-to-end
🤗 https://huggingface.co/datasets/sensenova/SenseNova-U1-Training-Sample 🛠️https://github.com/OpenSenseNova/SenseNova-U1
Sample previews demonstrating the diverse task coverage included in our open-source smoke-test dataset. 👇