AI 摘要
百度开源Unlimited OCR,专为一次性读取长文档设计。模型总参数量3B,仅激活500M,在OmniDocBench v1.5和v1.6上取得端到端SOTA。核心创新为参考滑动窗口注意力(R-SWA),模拟人类抄书过程,保持源、近期上下文和后续焦点,同时软遗忘无关信息。凭借恒定KV缓存大小和更低注意力成本,可在单次前向传播中转录40+页,不丢失上下文也不减速。模型已开源至GitHub和Hugging Face。
3B total parameters &; 500M activated, yet powerful enough to transcribe 40+ pages in one pass while keeping context intact. Meet Unlimited OCR!
We're open-sourcing Unlimited OCR - built to read long documents in one pass. With 3B total parameters and only 500M activated, Unlimited OCR sets new end-to-en...