Chubby♨️@kimmonismus

2026-05-14 19:33·49天前

AI 摘要

传闻即将发布的Gemini 3.2 Flash模型在编码和推理任务上达到了GPT-5.5约92%的性能水平，同时推理成本降低了15至20倍。其延迟表现也极为出色，多数查询响应时间低于200毫秒。这主要得益于DeepMind的蒸馏和稀疏化技术，成功将前沿模型压缩为“Flash”变体，而避免了通常伴随的质量大幅下降。

Rumors about the new Gemini Flash coming in. And holy， if true then big：

92% of GPT-5.5's coding and reasoning performance， reportedly at 15-20x lower inference cost. And the latency？ Sub-200ms for most queries.

That would be nuts. no joke.

Bindu ReddyGemini 3.2 Flash - Capitalizing on DeepMind's clever distillation techniques... Rumors are that benchmarks show it's hitting 92% of GPT 5.5's performance on cod...

Google 推理模型发布编码

在 X 查看原推导出 Markdown

Chubby♨️@kimmonismus · X

58导出 Markdown