Rohan Paul@rohanpaul_ai

2026-04-22 14:50·71天前

AI 摘要

研究发现手机智能体在执行日常任务时存在严重隐私隐患。通过MyPhoneBench评估，最佳模型任务完成率达82.8%，但隐私合格分数仅47.6%。隐私风险源于"过度帮助"——模型为完成任务会索要不需要的个人信息、向无关组件重复披露数据或过度填充可选字段。Claude任务成功率领先，Kimi隐私保护最佳，Qwen综合得分最高。研究表明，仅以成功率为标准的基准测试混淆了能力与判断力，在手机这类私密设备上构成严重安全隐患。

This paper asks whether phone-use agents protect your data during ordinary tasks， and finds that they often do not.

The best model completed 82.8% of tasks， but the best privacy-qualified score was only 47.6%.

That gap matters because privacy failure here is not sabotage. It is ordinary over-helpfulness.

A phone agent can finish your food order， book your appointment， or fill your travel form while still asking for a phone number it did not need， re-entering it into a coupon box， or stuffing optional fields with personal details just because the boxes were there.

To measure that behavior， the authors built MyPhoneBench， which logs exactly what agents type， where they type it， and whether any of it was necessary.

The benchmark splits privacy into three checks： asking for protected data it did not need， re-disclosing data to plausible but irrelevant widgets， and filling optional personal fields just because they were there.

Here's the part most people miss. The hardest problem was not detecting obvious permission boundaries， but resisting the urge to complete forms too thoroughly.

That sounds minor until you look at the mechanism. Once a model is optimized to finish the task， every visible blank starts to look like progress， even when leaving it empty is the safer choice.

The rankings changed depending on what you measured： Claude led raw task success and later memory use， Kimi led average privacy， and Qwen narrowly led the combined score that required both completion and acceptable privacy.

So the real lesson is not that phone agents are useless. It is that success-only benchmarks confuse capability with judgment， and on a device as intimate as a phone， that gap is the whole story.

----

Paper Link - arxiv. org/abs/2604.00986

Paper Title： "Do Phone-Use Agents Respect Your Privacy？"

智能体 Anthropic 安全/对齐论文/研究

在 X 查看原推导出 Markdown

Rohan Paul@rohanpaul_ai · X

导出 Markdown