HiDream开源了8B参数的HiDream-O1-Image模型,其核心创新在于采用像素级统一变换器,用单一架构直接处理原始图像块、文本与任务条件,将文本生成图像、编辑、个性化等任务统一为上下文生成,无需传统的VAE和文本编码器管线。该模型内置推理提示代理,能原生支持最高2048×2048的高分辨率合成。在性能上,它在参数量仅为部分同类模型三分之一的情况下,达到了可比的水平,尤其在文本渲染任务上表现出色,结果接近更大规模的模型。
HiDream just open-sourced an 8B image model with a big message behind it: the old diffusion pipeline (VAE-plus-text-encoder) may not be the only serious path left.
8B param, HiDream-O1-Image (8B) claims parity with models over 3x its size (e.g., 27B Qwen-Image).
@HiDream_AI , @vivago_ai
Key Features
🧬 Pixel-Level Unified Transformer - One end-to-end model on raw pixels, no VAE, no disjoint text encoder.
🎨 One Model, Many Tasks - Text-to-image, long-text rendering, instruction editing, subject-driven personalization, and storyboard generation in a single architecture.
🧠 Reasoning-Driven Prompt Agent - Built-in "thinking" agent that resolves implicit knowledge, layout, and text rendering before generation.