奇怪的标题——我不确定解决10个极其困难的新问题中的7个就意味着AI“没有完成任务”,而15个月前大语言模型还不会做数学。 但实际研究很有趣,揭示了AI在数学中的缺陷与成功。https://1stproof.org/assets/docs/report.pdf [引用 @Nature]:人工智能经历了其最严谨的数学测试,然而它并未完成任务 https://go.nature.com/4oqlNk6
Weird headline - I am not sure solving 7 out of 10 novel very hard problems meant AI "did not live up to the task," when 15 months ago LLMs couldn't do math.
But the actual study is interesting and illuminates flaws &; successes of AIs in math. https://1stproof.org/assets/docs/report.pdf