Ethan Mollick@emollick

2026-06-03 04:28·30天前

AI 摘要

仅从分数很难判断 MAI-Thinking-1 有多好（比如 GPQA 和 Terminal Bench 2.0 的分数低得奇怪）但微软在模型发布后很难让人试用（这是许多微软 AI 产品的通病），所以我不太清楚。不过数据低于 Meta Spark。

It is difficult to know how good MAI-Thinking-1 is from the scores alone （like weirdly low GPQA &amp； Terminal Bench 2.0）

But Microsoft makes it really hard to try its models upon release （a general issue with many Microsoft AI products）， so I dunno. Stats below Meta Spark， though.

Microsoft 大佬观点

在 X 查看原推导出 Markdown

Ethan Mollick@emollick · X

38导出 Markdown

2026-06-03 04:28·30天前

在 X 看原推· x.com

AI 摘要

It is difficult to know how good MAI-Thinking-1 is from the scores alone （like weirdly low GPQA &amp； Terminal Bench 2.0）

But Microsoft makes it really hard to try its models upon release （a general issue with many Microsoft AI products）， so I dunno. Stats below Meta Spark， though.

Microsoft 大佬观点

在 X 查看原推