Meta 发布非侵入式脑机接口 Brain2Qwerty v2,词错误率降至 39%
阅读原文· the-decoder.comMeta FAIR 团队发布 Brain2Qwerty v2,从非侵入性 MEG 脑信号重建完整句子。九名志愿者输入 22000 句,平均词错误率 39%,最佳参与者 22%。v2 采用异步连续信号窗口,无需击键时间戳。模型使用三个 AI 构建块,包括基于 Qwen3 微调的语言模型将噪声信号转为连贯句子。字符错误率 31%,高于 v1 N-gram 模型的 26%,但词错误率和语义准确率更优。当前与植入式系统(词错误率 <2%)仍有差距,但数据量增加后精度持续提升。
Meta's non-invasive brain-to-text AI is closing the gap with surgical implants
Meta's FAIR research team has released Brain2Qwerty v2, a model that reconstructs full sentences from non-invasive brain recordings. The average word error rate drops to 39 percent, and the best participant hits 22 percent.
People who lose the ability to speak or move after a brain injury need a way to communicate. Brain implants already do this reliably, but they require risky surgery. Meta's AI division FAIR has been working on a surgery-free alternative for some time and now shows a major improvement with Brain2Qwerty v2.
For the study, researchers recorded brain activity from nine healthy volunteers using magnetoencephalography (MEG), a technique that measures magnetic fields outside the skull. Each person was recorded for ten hours. Together, they typed a total of 22,000 sentences. The setup worked like this. Participants heard a sentence, paused briefly, then typed it on a keyboard without seeing the text on screen. The model reconstructs the sentence from brain signals captured during that typing phase. According to the paper, the measurable activity comes mainly from the motor cortex, which controls finger movements.
Ten times more data lets the model ditch keystroke timing
The direct predecessor, Brain2Qwerty v1, still needed the exact timestamp of every single keystroke to align the signals. Version 2 works with a continuous signal window instead and assigns characters on its own, with no timing information. This asynchronous approach removes a key barrier on the path to real-time use, even though the system hasn't crossed that threshold yet. The harder task only works, the researchers say, because the new dataset contains ten times more recordings per person and far more varied sentences than the original.
The model relies on three AI building blocks, according to the team. Deep learning replaced the hand-built recognition steps used before. The system processes signals at three levels: characters, words, and full sentences. And the team used AI agents to write the optimization code themselves. For the sentence level, a language model (Qwen3) is fine-tuned to shape noisy brain signals into coherent sentences.
Brain2Qwerty v2 reaches an average word error rate of 39 percent, compared to 55 percent for the raw encoder without a language model. For the best participant, 28 percent of sentences are decoded perfectly, and 47 percent contain at most one wrong word.
Better words, but more wrong characters
The team compares Brain2Qwerty v2 against two simpler methods. The first is the raw encoder, which reads characters directly from the brain signal with no language model smoothing the output. The second is the approach from Brain2Qwerty v1, where an N-gram model corrects the encoder output. That kind of model knows the statistical likelihood of letter sequences from large text collections and patches individual character strings locally, but it doesn't form whole sentences.
Performance is measured at three levels. Character error rate (CER) counts wrong letters. Word error rate (WER) counts wrong words. And semantic error rate captures how far the meaning drifts from the target sentence. On words and meaning, Brain2Qwerty v2 wins. The word error rate drops to 39 percent, compared to 55 percent for the raw encoder and 43 percent for the N-gram model from v1.
At the character level, the picture flips. Here v2 hits 31 percent errors, worse than the raw encoder (28 percent) and the N-gram model (26 percent). The reason is the language model: It's trained to produce fluent sentences, even when the brain signal doesn't really support them. When in doubt, it invents a grammatically clean but completely wrong sentence.
For the worst-performing participant, the model decoded "had she not fallen down the stairs" instead of the target sentence "cars are not allowed on this road." A total miss that drives the character error rate up. The N-gram model only corrects locally and stays closer to individual letters, but rarely produces a real word. Since successful communication depends on meaning, not exact character matches, the team considers the better word and semantic scores the more relevant progress. An earlier fMRI-based study, for comparison, hit 92 to 94 percent word errors.
When AI optimizes AI research
The work also has an auto-research component: three independent agents based on Claude Opus 4.6 were tasked with lowering the error rate on their own by modifying code and running experiments. They found techniques like label smoothing, modality dropout, and shorter prompts that held up across all participants, beating a standard optimization method by a clear margin. But when given an open-ended task, the same agents failed. Their extensive code changes crashed most compute jobs. Human research remains a critical part of the process for now, the team concludes.
The gap to implanted systems remains large, however. Invasive interfaces achieve below two percent word error rate for typing. But Brain2Qwerty v2's accuracy keeps climbing with more data, and no ceiling is in sight yet, so the researchers see collecting more recordings as a straightforward lever. Still, open questions remain: There are significant differences between participants, the study is limited to healthy volunteers making real typing movements, and real-time capability is still missing. As a path to clinical use, the team points to portable MEG sensors that work at room temperature. Tests showed that even half the sensors deliver nearly full performance.
A window into the brain, not just a medical tool
The work builds on a longer research track at FAIR led by neuroscientist Jean-Rémi King. His team already decoded perceived speech from MEG and EEG data in 2022 and generated images from brain activity in milliseconds in 2023. Most recently, the team showed TRIBE v2, a model that predicts brain activity instead of measuring it. The direct predecessor Brain2Qwerty v1, which reconstructed typed sentences with up to 80 percent character-level accuracy and achieved a character error rate of 29 percent on MEG and 65 percent on EEG across 35 participants, has since been published in Nature Neuroscience.
Behind Brain2Qwerty sits a broader research program that King sees as more than an engineering challenge. Neuroscience and AI have been tightly linked from the start, he said in an interview with The Decoder: "AI today also makes it clear that some of the concepts we take for granted - like reasoning or thinking - may need to be re-evaluated in light of what deep learning algorithms are now capable of." For King, models that translate brain activity into text aren't just a medical tool. They're a window into how the brain itself works.