通道级向量量化

2026-05-25 08:00·39天前

AI 摘要

本文提出通道级向量量化，这是一种新的图像token化范式，用通道级token取代了传统的基于块的token。该方法量化特征图的每个通道，将图像表示为离散的视觉细节层次。基于此，作者提出了采用“下一个通道预测”机制的通道级自回归模型。该模型顺序预测通道，先勾勒全局结构再细化细节。实验表明，CVQ在16K+大小的码本上实现了100%利用率，显著提升了重建质量；CAR模型在文本到图像生成任务中取得了86.7的DPG分数和0.79的GenEval分数。

原文 · 未翻译

We present Channel-wise Vector Quantization (CVQ), a novel image tokenization paradigm that replaces patch-wise tokens with channel-wise tokens. Unlike conventional vector quantization, which assigns a discrete token to each patch feature vector, CVQ quantizes each channel of the feature map. This formulation represents an image as discrete levels of visual details, rather than as a grid of spatial patches. Based on CVQ, we introduce a new visual autoregressive framework with "next-channel prediction". Instead of rendering images patch by patch in raster order, our Channel-wise Autoregressive (CAR) model predicts image channels sequentially, producing progressively enriched visual details. Specifically, it first sketches global structure and then refines fine-grained attributes, akin to a human artist's workflow. Empirically, we show that: (1) CVQ achieves 100% codebook utilization with a 16K+ codebook size without any bells and whistles, and substantially improves reconstruction quality over conventional VQ; and (2) CAR attains a DPG score of 86.7 and a GenEval score of 0.79, demonstrating strong effectiveness for text-to-image generation.

HuggingFace Daily Papers（社区热门论文）

64导出 Markdown

通道级向量量化

2026-05-25 08:00·39天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译