# Affordance20Q：面向物理属性的功能推理基准

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-12 08:00
- AIHOT 分数：39
- AIHOT 链接：https://aihot.virxact.com/items/cmqfnzq8a00l1slb8i7zzwd9e
- 原文链接：https://arxiv.org/abs/2606.14240

## AI 摘要

现有功能推理评测常暴露物体身份，使模型依赖记忆而非推理。新基准 Affordance20Q 采用 20 问游戏形式，隐藏物体身份，要求模型通过询问形状、材质等物理属性推断功能。数据集包含 1,009 个游戏，覆盖 454 个物体和 59 种功能。15 个大语言模型的测试显示，模型与人类表现差距约 20 个百分点。基于 KL 散度的信息增益分析表明，模型在游戏后期难以提出有区分度的问题。提出的 KARI 方法利用知识库生成功能规则，将开源 LLM 提升最高 15.2 个百分点。代码和数据已开源。

## 正文

Affordance reasoning, the inference of an object's action possibilities from its physical properties (e.g., shape and material), is fundamental to human physical understanding and increasingly critical for Large Language Models (LLMs). However, existing affordance benchmarks largely expose explicit object identities in the evaluation setup, allowing models to rely on memorized object-affordance mappings rather than reasoning over physical properties. To address this gap, we introduce Affordance20Q, a novel affordance reasoning benchmark formulated as a 20-Questions game without exposing the object's identity. In each game, the model identifies a hidden object's affordance from a candidate set by asking yes/no questions about its physical properties. Affordance20Q comprises 1,009 games over 454 objects and 59 affordances, all manually filtered, refined, and annotated. We conduct comprehensive experiments with 15 state-of-the-art LLMs and find a substantial gap (~20 points) compared to human performance. A KL-based information-gain (IG) analysis further shows that models fail to ask discriminating questions as the game progresses. To close the gap, we develop KB-Anchored Rule Induction (KARI), a pipeline based on LLMs that generates affordance rules grounded in evidence from knowledge bases (KBs). KARI improves open-source LLMs by up to 15.2 points, while the limited coverage of KBs hinders further gains. We release all our code and data at https://github.com/1171-jpg/Affordance20Q.git
