# Sakana AI 推出 Fugu：动态协调多 LLM 的系统，匹配 Anthropic 顶级模型性能

- 来源：The Decoder：AI News（RSS）
- 作者：Matthias Bastian
- 发布时间：2026-06-22 16:18
- AIHOT 分数：66
- AIHOT 链接：https://aihot.virxact.com/items/cmqoyx7og05k4slx6qs17zaid
- 原文链接：https://the-decoder.com/sakana-ais-fugu-orchestrates-multiple-llms-to-match-anthropics-fable-and-mythos-benchmarks

## AI 摘要

日本 AI 初创公司 Sakana AI 发布 Fugu，一个能动态协调多个大语言模型的系统。Fugu 本身也是一个语言模型，可从可替换的智能体池中调用其他 LLM（含自身副本），通过单一 OpenAI 兼容 API 提供服务。Fugu 有基础版和 Fugu Ultra 变体。Sakana 公布的基准测试显示，Fugu Ultra 在编码、推理、科学和智能体评测中与 Anthropic Fable 5 和 Mythos Preview 表现相当。Fugu 旨在降低对单一 AI 供应商的依赖，模型池可完全替换。约 500 名 Beta 用户在长流程任务中测试，Fugu Ultra 的 bug 捕获量远超 GPT 5.5。两个变体现在已通过 API 上线。

## 正文

Sakana AI's Fugu orchestrates multiple LLMs to match Anthropic's Fable and Mythos benchmarks

Matthias Bastian View the LinkedIn Profile of Matthias Bastian

Jun 22, 2026

Nano Banana Pro prompted by THE DECODER

Key Points

Japanese AI startup Sakana AI is launching Fugu, a system that dynamically coordinates multiple language models from a swappable pool while behaving like a single model through one API.

Sakana says Fugu outperforms Anthropic's best models, Fable and Mythos, in benchmarks, even though neither model is part of its LLM pool.

Fugu comes in a base version for everyday tasks and a more powerful Fugu Ultra variant. The swappable pool design also aims to reduce dependence on any single AI provider.

Tokyo-based AI startup Sakana AI is launching Fugu, a system that dynamically coordinates multiple AI models to compete with leading systems like Anthropic's Fable 5. The approach also aims to reduce dependence on any single AI provider.

Tokyo-based startup Sakana AI has unveiled Fugu, a multi-LLM orchestrator that looks and feels like a single model to the user. Sakana already had strong results with orchestrator setups for coding. Its ALE-Agent placed 21st out of 1,000 human experts in a coding competition.

Fugu is itself a language model, trained to call other LLMs from an agent pool, including copies of itself. Depending on the request, it either handles a task on its own or pulls together a team of specialized models. Selection, delegation, checks, and synthesis all run internally. Users access everything through a single OpenAI-compatible API.

Sakana Fugu dynamically orchestrates multiple language models from a swappable agent pool to tackle complex tasks. To the user, it behaves like a single model with one API. | Image: Sakana AI

Fugu Ultra aims to match top-tier models

Sakana AI is launching two variants. The base Fugu model targets low latency and solid everyday performance across coding, code review, and chatbot use cases. Teams with privacy or compliance needs can exclude specific agents from the pool.

Fugu Ultra is built for maximum answer quality on complex, multi-step problems. Early users have put it to work on AI research, reproducing scientific papers, cybersecurity analysis, and patent and literature searches.

According to benchmark results Sakana AI published, Fugu Ultra performs on par with Anthropic's Fable 5 and Mythos Preview across a range of coding, reasoning, science, and agent benchmarks.

According to Sakana, its LLM orchestrator Fugu sets new benchmark highs, beating Anthropic's Fable 5 and Mythos 5. | Image: Sakana AI

Neither Anthropic model is in Fugu's agent pool, though, since they aren't publicly available. With those models included, Fugu would likely score even higher. Sakana AI says the baseline comparison numbers come from the model providers themselves. The table below shows how Fugu stacks up against the underlying base models.

Benchmark Fugu Fugu Ultra Opus 4.8 Gemini 3.1 Pro GPT 5.5

SWE Bench Pro 59.0 73.7 69.2 54.2 58.6

TerminalBench 2.1 80.2 82.1 74.6 70.3 78.2

LiveCodeBench 92.9 93.2 87.8 88.5 85.3

LiveCodeBench Pro 87.8 90.8 84.8 82.9 88.4

Humanity's Last Exam 47.2 50.0 49.8 44.4 41.4

CharXiv Reasoning 85.1 86.6 84.2 83.3 84.1

GPQA-D 95.5 95.5 92.0 94.3 93.6

SciCode 60.1 58.7 53.5 58.9 56.1

τ³ Banking 21.7 20.6 20.6 8.4 20.6

Long-Context Reasoning 74.7 73.3 67.7 72.7 74.3

MRCRv2 86.6 93.6 87.9 84.9 94.8

Orchestration as a hedge against vendor lock-in

Sakana AI is pitching Fugu as a safeguard against single-provider dependence. The company points to the recent export controls on Anthropic's Fable and Mythos models as a concrete example. Access to top AI systems can vanish overnight due to regulatory shifts or foreign policy decisions.

"For an organization or a nation, relying on a single company’s APIs for critical infrastructure, finance, or governance is a material vulnerability. This risk is no longer a hypothetical possibility, but a reality," Sakana AI writes in its announcement. Fugu's model pool is fully swappable, so the system can reroute to other models if one provider goes dark.

The system's real-world performance depends entirely on which models are in the pool, though. If several top providers restrict access at the same time, Fugu's options shrink too. An orchestrator like Fugu may boost resilience, but it's not the same as true sovereignty. Still, Fugu could be worth watching on performance alone.

Early testers report gains on complex workflows

About 500 beta users have already tested the system in real-world settings, according to Sakana AI. Fugu proved strongest on long, multi-step workflows like automated data research, security analysis, and code reviews.

One software developer says Fugu Ultra catches far more bugs during code review than GPT-5.5. "Where other tools flag about three issues, Fugu surfaced more than twenty." Sakana AI also claims Fugu beat Gemini 3.1 Pro, Opus 4.8, and GPT 5.5 in its own tests on automated research, mechanical design, and financial forecasting.

Video: According to Sakana, Fugu solves and visualizes a Rubik's Cube faster than the individual models.

"The beta made clear that multi-agent orchestration matters most when the task is messy, long-running, and difficult to solve with a single model call," writes Sakana AI.

Both variants are live now through a single API on the product page and console. Sakana offers subscription plans for daily use and usage-based billing for bigger workloads.

Sakana's bet is an AI ecosystem rather than a single model

Fugu's technical approach builds on Sakana AI's own research into learned model orchestration, specifically two papers presented at ICLR 2026 called Trinity and Conductor.

The idea fits Sakana AI's broader vision of applying natural principles like swarm behavior, evolution, and collective intelligence to AI systems. The company sees powerful AI not as a single-model problem but as a collaborative ecosystem that goes beyond what any one model can do alone.

Sakana AI was founded by former Google AI researchers Llion Jones and David Ha. Jones co-authored the 2017 "Attention Is All You Need" paper that gave us the Transformer.