法学教授们写下了他们在办公时间被学生问到的问题。Gemini 2.5 和人类分别作答,然后其他法学教授在不知道答案作者的情况下对结果进行了评判: - Gemini 的胜率为 75%,击败了教授们 - Gemini 的答案被评为比人类的答案危害更小 - 更新的模型表现甚至更好
Law professors wrote questions they were asked during office hours. Gemini 2.5 &; humans answered them then other law professors blindly judged the results: -Gemini had a 75% win rate vs. professors -Gemini's answers were rated LESS harmful than humans -Newer models do even better