传闻即将发布的Gemini 3.2 Flash模型在编码和推理任务上达到了GPT-5.5约92%的性能水平,同时推理成本降低了15至20倍。其延迟表现也极为出色,多数查询响应时间低于200毫秒。这主要得益于DeepMind的蒸馏和稀疏化技术,成功将前沿模型压缩为“Flash”变体,而避免了通常伴随的质量大幅下降。
Rumors about the new Gemini Flash coming in. And holy, if true then big:
92% of GPT-5.5's coding and reasoning performance, reportedly at 15-20x lower inference cost. And the latency? Sub-200ms for most queries.
That would be nuts. no joke.