Chatbot Arena 推出多模态排行榜
阅读原文· lmsys.orgChatbot Arena 新增图像对战功能并发布多模态排行榜。基于两周内17,429份跨60余种语言的投票,GPT-4o以1226分领跑,Claude 3.5 Sonnet以1209分紧随其后,两者视觉优势较纯语言模型更明显。Gemini 1.5 Pro与GPT-4 Turbo并列第三,开源模型Llava 1.6 34B位列第八。平台同步将"Elo评分"更名为"Arena Score",并计划扩展至PDF、视频及音频等模态支持。
What's next?
The Multimodal Arena is Here!
Multimodal Chatbot Arena
We added image support to Chatbot Arena! You can now chat with your favorite vision-language models from OpenAI, Anthropic, Google, and most other major LLM providers to help discover how these models stack up against eachother.
In just two weeks, we have collected over 17,000 user preference votes across over 60 languages. In this post we show the initial leaderboard and statistics, some interesting conversations submitted to the arena, and include a short discussion on the future of the multimodal arena.
Leaderboard results
Table 1. Multimodal Arena Leaderboard (Timeframe: June 10th - June 25th, 2024). Total votes = 17,429. The latest and detailed version here.