# MiniMax Speech 2.8 语音模型

- 来源：MiniMax：Blog（网页）
- 发布时间：2026-01-23 00:00
- AIHOT 标记：精选
- AIHOT 链接：https://aihot.virxact.com/items/cmo4bnpds01noslk7v6054gq8
- 原文链接：https://www.minimax.io/news/minimax-speech-28

## 精选理由

10秒样本克隆真人声线，AI说话带'嗯啊'呼吸声，MiniMax语音模型上新

## AI 摘要

MiniMax 发布新一代语音模型 MiniMax Speech 2.8，通过原生声音标签技术模拟人类口语中的"嗯"、"啊"等填充词及呼吸停顿，显著提升对话自然度。该模型支持10秒样本高保真声音克隆，精准还原音色与语速，同时消除背景噪音与数字伪影，输出录音室级纯净音质。此外，模型优化了跨语言表现，从普通话-日语对开始解决口音渗透问题，实现更接近母语者的发音效果。

## 正文

2026.1.23

MiniMax Speech 2.8: Breathing life into AI voice

Access API

Try Audio Now

MiniMax Speech 2.8: Breathing Life into AI Voice

Today, we are excited to introduce MiniMax Speech 2.8.

This isn't just a technical upgrade; it's a breakthrough in vocal authenticity. By introducing native sound tag support, high-fidelity cloning, and studio-grade clarity, we are closing the gap between AI and the human voice.

Our mission remains clear: to make synthetic speech feel truly human and indistinguishable.

1. Reclaiming the "Nuance": Teaching AI to Hesitate and Breathe

In the past, AI voices often felt cold because they were "too perfect". Real human speech is filled with imperfect breaths, pauses, and hesitations—subtle signals that convey emotion and emphasize key points.

Speech 2.8 introduces Native Sound Tags. By modeling colloquial fillers like "um," "uh," and "ah," we preserve the natural rhythm, pitch, and pauses of human dialogue.

No more robotic, flattened speech; the warmth is in the details.

Speech-2.8 Sound Tags Demo

Text: "Hey, it's me. How are ya? (chuckle) I hope you're having an awesome day! We actually had a bit of a crazy launch day yesterday, you know, but (breath) I'm just recovered and ready to roll. You're listening to this and probably thinking I'm just chatting into a microphone, right? But here's the twist: (clear-throat) I'm actually not human. I am the new Speech 2.8 model from MiniMax. Crazy, right? (laughs) If you listen closely, you can hear how I handle the pacing, the little breaths, and even that casual vibe. Have a great day!"

2. Voice Cloning: Replicate Your "Vocal Fingerprint" in 10 Seconds

We have optimized our feature extraction process to achieve a new level of similarity in voice cloning. With just a 10-second sample, Speech 2.8 precisely captures your unique texture, breathiness, and even your specific speaking pace.

The result isn't just a voice that sounds "like" you—it is you.

Original Audio

Speech-2.8 Cloned Result

This English demo showcases how Speech 2.8 captures the "soul" of a professional narrator:

Authentic Conversationality: The voice has a "lived-in" quality. It doesn't sound like a stiff announcer; instead, it sounds like a trusted friend sharing a story over coffee.

Dynamic Cadence: This speaker uses natural fillers and rhythmic pauses (like "but anyways," "you know") that create a sense of spontaneity and presence.

Warm, Mid-Range Resonance: The timbre is grounded and steady, providing a sense of comfort and reliability that builds immediate rapport with the listener.

3. Pure Audio: Eliminating Background Noise and Digital Artifacts

Audio purity is the foundation of a premium experience.

We've re-engineered our processing engine to eliminate background noise and synthetic distortion. The result is a crystal-clear, transparent output that delivers the presence of a professional narrator recording in a studio.

Speech-2.8 Noiseless Demo

Deep in the forest, there lies a silence that remains untouched. As the first light of dawn filters through the dense canopy, the world seems to hold its breath. Listen closely—that is the soft whisper of the wind through the pines, a sound so delicate it is barely more than a secret.

Let us linger in this peace for a moment, rediscovering the essential gentleness that the noisy world so often hides.

Smarter Cross-Lingual Performance: A Global Voice for Every Market

We're breaking down language barriers by eliminating the "accent bleed" that often occurs in AI speech.

Starting with our Mandarin-Japanese pair, we've fixed unnatural tones and pronunciation shifts to ensure every voice sounds like a true native speaker. Stay tuned as we bring this seamless experience to even more languages soon.

Speech-2.8 Cross-Lingual Demo

MiniMax Speech 2.8 is now live. Experience the next generation of intelligence.

• MiniMax Open Platform: Platform.minimax.io/docs/guides/models-intro

• MiniMax Audio: Minimax.io/audio

Intelligence with Everyone.