# MiniMax-M2-her 技术解析：专为角色扮演打造的 AI 智能体

- 来源：MiniMax：Blog（网页）
- 发布时间：2026-01-27 00:00
- AIHOT 分数：50
- AIHOT 链接：https://aihot.virxact.com/items/cmpunu0g60008slbr97hejdgf
- 原文链接：https://www.minimax.io/blog/minimax-m2-her

## AI 摘要

MiniMax 基于产品 Talkie/Xingye 三年的观察，推出了专为角色扮演场景优化的模型 MiniMax-M2-her。团队发现，深度角色扮演的核心是“叙事精度”和“情感连接”。该模型旨在解决三大挑战：保留每个角色与世界观的“灵魂”、维持故事随时间推进的叙事活力、以及解读用户的隐式意图。其目标是提供高保真的世界体验，能主动推动故事发展以赋予张力，并动态适应用户的长期习惯，实现直觉性的偏好对齐。

## 正文

Three Years of Observations: How We Define Role-Play

The “Regenerate” button follows a long-tail usage pattern, concentrated on narrative pivot points. Whether it’s a confession or a moment of sentiment, users hit “regenerate” to curate their own “perfect moment”. This signals that the role-play experience is not about a binary pass/fail judgment, but rather a pursuit of narrative precision. What matters most to users is the fidelity of these peak emotional experiences.

NPC popularity diverges from a typical power-law curve. Unlike broad content platforms, even niche characters maintain distinct, high-retention user groups. For these users, the character’s specific idiosyncrasies are the core value proposition. If our model regresses to satisfy the “average” experience, we destroy the very nuance that minority users value, leading to engagement loss in the long tail.

Conversation turn count correlates non-linearly with engagement. We observed a significant drop in conversation turns after turn 20. This signals that shallow role-play is driven by novelty, while long-term retention depends not on one-time thrills but on whether the NPC and user can build a stable emotional connection within limited turns. Based on this, we decomposed engagement drivers into instant gratification and long-term connection. We continuously deepen emotional bonds while providing new stimuli through exploration.

How do we preserve the distinct “soul” of each world? (Worlds) User-generated contexts span a massive spectrum—from slice-of-life campus dramas to high-stakes fantasy epics, from intimate dyads to complex ensemble casts. If our model merely learns the “average,” characters will homogenize, and these diverse worlds will collapse into mediocrity. We need a model capable of representing the full distribution, preserving the fidelity of both mainstream hits and long-tail niches without regression.

How do we sustain narrative vitality over time? (Stories) As conversation length increases, the risk of coherence drift rises. Models naturally tend toward mechanical loops and repetitive phrasing, causing narrative tension to evaporate. A compelling story requires cadence—the intelligence to know when to escalate conflict to drive the plot, and when to slow down to allow for emotional processing.

How do we decode implicit user intent? (User Preferences) Users rarely explicitly state their pacing preferences. Some seek a “slow burn” emotional buildup, while others crave rapid plot progression. The model must learn to infer these unspoken desires from contextual cues, dynamically aligning its rhythm and tone with the user’s underlying psychological flow.

1 MiniMax-M2-her

High-Fidelity World Experience: MiniMax-M2-her does more than process text; it anchors itself within complex settings. Whether the context is a sprawling epic or an intimate drama, it maintains strict coherence, ensuring every interaction aligns with the established lore and the character’s soul.

Dynamic Story Progression: MiniMax-M2-her rejects mediocre repetition and rigid patterns. By utilizing richer, more vivid prose, it actively drives the plot forward, imbuing stories with the tension and breathing rhythm of life itself.

Intuitive Preference Alignment: MiniMax-M2-her is designed to read between the lines. It detects unspoken expectations and subtle context cues, adapting dynamically to the user’s unique style and long-term habits without needing explicit instruction.

2 Starting with Evaluation — Is A/B Testing A Good Evaluation?

Basics: We scan for mixed languages, excessive repetition, and formatting glitches.

Logic: We place special emphasis on Reference Confusion, a metric that reflects whether models can truly remember user-constructed characters’ relationships.

Knowledge: We ensure the model adheres to the immutable physical and magical laws of the specific setting.

Diversity: We detect single-pattern phrasing, repetitive plot beats, stagnation, and low-information filler.

Content Logic: It measures narrative coherence and OOC (out-of-character) breaks.

AI Speaks for User: Reflects whether the model oversteps boundaries.

AI Ignores User: Captures whether the model talks to itself.

AI Silence: Judges whether the model provides “hooks” that invite a reply.

Interaction Boundary: Requires models to balance safety boundaries with emotional interaction.

Long-range quality stability: Most models hit a “performance wall” after turn 20. MiniMax-M2-her avoids context bloat and compounding logic gaps.

Response length controllability: MiniMax-M2-her has been specifically optimized for brevity. Even in 100-turn conversations, it maintains response length within the optimal range.

3 How We Built MiniMax-M2-her

Random sampling from NPC/User Prompts library and instantiating expert models.

Expert models act as NPC and User with a Dynamic Chat Planning Module guiding direction and emotional tone.

Best-of-N (BoN) sampling to filter low-quality outputs.

LLM-as-a-judge agent periodically reviews and rewrites segments to correct drift.

Rewritten segments become the initial state for next synthesis round.

Scenario diversity: Dispersion sampling to neutralize style bias from overrepresented tropes.

Prompt diversity: Enriching skeletal NPC Prompts with worldview positioning and plot development.

Style diversity: Pool of expert models finetuned on distinct stylistic corpora.

Structural diversity: Dynamic turn allocation enabling consecutive turns and varied rhythms.

Segment Checking and Refinement: Periodic scanning for surface errors, logic failures, and repetition.

User-side Planning Agent: Assesses conversation state and introduces new plot elements to maintain narrative progress.

Stratified Bias Removal: Categorize annotators to neutralize systematic biases.

Causal Inference: Session Duration is a high-fidelity predictor of satisfaction; Turn Count is weaker.

Quality Floor Filter: Discard signals that fail baseline quality benchmarks.

4 What’s Next?
