# PersonaVLM：面向长期个性化的多模态大语言模型

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-03-20 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo6k56v904ljsl4rj4zal6zt
- 原文链接：https://arxiv.org/abs/2604.13074

## AI 摘要

PersonaVLM 是一个支持长期个性化的多模态智能体框架，具备记忆提取、多轮推理和响应对齐三大核心能力，可将通用 MLLM 转化为持续学习用户偏好的个性化助手。研究团队同步发布了包含 2000 余个案例的 Persona-MME 评测基准，涵盖 7 个维度与 14 项细粒度任务。实验表明，在 128k 上下文下，该方法较基线模型在 Persona-MME 和 PERSONAMEM 上分别提升 22.4% 和 9.8%，并超越 GPT-4o 达 5.2% 和 2.0%。

## 正文

Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture users' evolving preferences and personality over time (see Fig.1). In this paper, we introduce PersonaVLM, an innovative personalized multimodal agent framework designed for long-term personalization. It transforms a general-purpose MLLM into a personalized assistant by integrating three key capabilities: (a) Remembering: It proactively extracts and summarizes chronological multimodal memories from interactions, consolidating them into a personalized database. (b) Reasoning: It conducts multi-turn reasoning by retrieving and integrating relevant memories from the database. (c) Response Alignment: It infers the user's evolving personality throughout long-term interactions to ensure outputs remain aligned with their unique characteristics. For evaluation, we establish Persona-MME, a comprehensive benchmark comprising over 2,000 curated interaction cases, designed to assess long-term MLLM personalization across seven key aspects and 14 fine-grained tasks. Extensive experiments validate our method's effectiveness, improving the baseline by 22.4% (Persona-MME) and 9.8% (PERSONAMEM) under a 128k context, while outperforming GPT-4o by 5.2% and 2.0%, respectively. Project page: https://PersonaVLM.github.io.