# Count Anything

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-29 08:00
- AIHOT 分数：59
- AIHOT 链接：https://aihot.virxact.com/items/cmpupuvkw007tsl3ttbm2sorx
- 原文链接：https://arxiv.org/abs/2605.30846

## AI 摘要

为解决现有对象计数模型在跨类别、跨领域和跨密度分布场景下泛化能力不足的问题，本研究提出了Count Anything，一个文本引导的通用对象计数模型。该模型以图像和自然语言查询为输入，输出实例级的目标点集并以点数作为计数值。为此，研究构建了CLOC跨域大规模对象计数数据集，涵盖六个视觉领域约22万张图像、619个类别和1500万个对象实例。Count Anything采用双粒度实例枚举策略：区域级稀疏计数器为大而稀疏的目标提供锚点，像素级密集计数器处理小而密集的目标。通过互补计数融合机制无参数地结合两者，在多个领域展现出优于现有开世界计数方法的准确性和泛化能力。

## 正文

Object counting remains fragmented across domain-specific datasets and task formulations, despite rapid progress in generalist vision models. Existing counting models are often tailored to scenarios such as crowds, vehicles, cells, crops, or remote-sensing objects, and thus struggle to generalize across categories, visual domains, object scales, and density distributions. In this paper, we study text-guided object counting across domains, where a model takes an image and a natural-language query as input and returns an instance-grounded set of target points whose cardinality gives the count. This formulation unifies category-conditioned counting with interpretable spatial localization. To support this setting, we construct CLOC, a Cross-domain Large-scale Object Counting dataset that reorganizes diverse public data sources into a unified benchmark. CLOC covers six visual domains: General Scene, Remote Sensing, Histopathology, Cellular Microscopy, Agriculture, and Microbiology, with about 220K images, 619 categories, and 15M object instances. Based on CLOC, we propose Count Anything, a generalist model for text-guided object counting. Unlike density-map-based methods, which dominate counting models, Count Anything adopts discrete instance points and performs dual-granularity instance enumeration. A Region-level Sparse Counter provides object-level anchors for large and sparse targets, while a Pixel-level Dense Counter handles small, crowded, and weakly bounded targets via dense point prediction. A point-centric supervision strategy enables learning from heterogeneous annotations, and Complementary Count Fusion combines both counters in a parameter-free manner. Extensive experiments show that Count Anything achieves strong accuracy and multi-domain generalization, outperforming existing open-world counting methods. Code is available at: https://github.com/Mengqi-Lei/count-anything.