# 面向艺术字场景文本识别的数据集WATER-S与模型WATERec

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-23 08:00
- AIHOT 分数：43
- AIHOT 链接：https://aihot.virxact.com/items/cmqsvasrj06m1slfus5tvkprn
- 原文链接：https://arxiv.org/abs/2606.24484

## AI 摘要

针对艺术字高度定制化的字体、纹理与布局导致的识别困难，研究构建了2M规模的合成数据集WATER-S，包含两部分：由升级渲染管线SynthWordArt生成的高精度可控数据，以及结合Qwen3-VL挖掘提示词与Z-Image图像合成生成的多样化真实感数据。同时提出WATERec模型，采用支持任意形状输入的视觉编码器与自回归解码器，突破了固定模板限制。在WordArt-Bench上达到90.40%准确率，大幅超越通用视觉语言模型和OCR专用模型。代码与数据已开源。

## 正文

WordArt (artistic text) features highly customized fonts, textures, and layouts, making WordArt-oriented scene TExt Recognition (WATER) substantially more challenging than general Scene Text Recognition (STR). Existing STR datasets and methods, typically built around regular scene text and fixed-template inputs, struggle to scale to WATER. Thus, we aim to advance this task from both data and model perspectives. On the data side, we construct a 2M synthetic dataset, WATER-S, with the scale improved by hundreds of times compared to existing artistic text data. WATER-S consists of two complementary subsets. One rendered by an upgraded rendering pipeline (SynthWordArt), which provides highly accurate and controllable synthetic WordArt data. The other is generated by combining Qwen3-VL for prompt mining and Z-Image for image synthesis, which improves the coverage of realistic and diverse data. On the model side, we propose WATERec. It adopts an visual encoder supporting arbitrary-shaped inputs and an autoregressive decoder to model complex layouts, structurally breaking the bottleneck of fixed-template STR on WordArt. Experiments show that this architecture outperforms prior STR methods, achieving state-of-the-art performance on irregular texts such as WordArt. Together with WATER-R, carefully reorganized from existing real STR data, our strong baseline with the new synthetic data and model design reaches 90.40% accuracy on WordArt-Bench, surpassing both general-purpose and OCR-specialized vision-language models by a large margin. Code and data are available at https://github.com/YesianRohn/WATER.
