Kimi 供应商验证器--验证推理提供商的准确性
阅读原文· kimi.comKimi发布供应商验证器(Vendor Verifier),用于独立验证第三方AI推理提供商的输出准确性。该工具通过标准化测试方法,检测不同API供应商在模型推理质量上的一致性与可靠性,解决大模型服务中可能出现的输出偏差或性能波动问题。用户可借此评估各推理服务商的实际表现,确保获取符合预期的AI能力。目前该技术方案已在Kimi官网公开详细实现文档。
Rebuilding the "Chain of Trust": Kimi Vendor Verifier
Alongside the release of the Kimi K2.6 model, we are open-sourcing the Kimi Vendor Verifier (KVV) project, designed to help users of open-source models verify the accuracy of their inference implementations.
Not as an afterthought, but because we learned the hard way that open-sourcing a model is only half the battle. The other half is ensuring it runs correctly everywhere else.
Official Evaluation Results
You can click here to access the Kimi API K2VV evaluation results for calculating the F1 score.
Why We Built KVV
From Isolated Incidents to Systemic Issues
Since the release of K2 Thinking, we have received frequent feedback from the community regarding anomalies in benchmark scores. Our investigation confirmed that a significant portion of these cases stemmed from the misuse of Decoding parameters. To mitigate this immediately, we built our first line of defense at the API level: enforcing Temperature=1.0 and TopP=0.95 in Thinking mode, with mandatory validation that thinking content is correctly passed back.