# George Hotz 称编程智能体将成为软件开发中"代价最昂贵的错误之一"

- 来源：The Decoder：AI News（RSS）
- 作者：Matthias Bastian
- 发布时间：2026-05-25 17:05
- AIHOT 分数：62
- AIHOT 链接：https://aihot.virxact.com/items/cmpl0730v0bp2sl016pseqd6k
- 原文链接：https://the-decoder.com/george-hotz-says-coding-agents-will-be-one-of-the-most-costly-mistakes-in-software-development

## AI 摘要

程序员 George Hotz 在经过六个月测试后警告，AI 编程智能体将成为软件开发领域代价最昂贵的错误之一。他认为 LLM 虽然能快速生成原型，但在细节上会崩溃，产生越来越难以发现的 bug。他的立场体现了 AI 社区对于 LLM 在软件开发中角色的深刻分歧。

## 正文

George Hotz says coding agents will be "one of the most costly mistakes" in software development

Prominent programmer and hacker George Hotz warns that AI agents in software development do more harm than good. He says he's now in the "LeCun/Marcus camp," referring to AI researchers Yann LeCun and Gary Marcus, who doubt LLMs will ever become truly intelligent.

In his blog post "The Eternal Sloptember," Hotz argues that using AI agents in software development will become one of the industry's most expensive mistakes. He spent six months testing various models and tools, including work on tinygrad. His takeaway is that LLMs deliver fast prototypes but fall apart on the fine details.

Large organizations are especially at risk, he says, because weaker developers can't spot the flawed output. Hotz believes today's language models will never truly be able to code and that world models are needed instead. LLMs are "sophisticated statistical models" designed to "mimic the distribution of programming."

The output is flawed, but in a way that's "harder and harder to detect," exactly what you'd expect from an increasingly accurate statistical model, Hotz says. Quality indicators like syntax and grammar have become useless, he argues, since AI-generated artifacts don't emerge through the same process as human ones. As an example, he cites models that simply comment out a failing test and then report that all tests passed.

LLMs are splitting the AI community

Hotz has switched sides: from LLM optimist ("o1-preview is the first model that's capable of programming (at all)") to skeptic. LeCun, whom Hotz cites, just recently denied that LLMs possess intelligence with a similar argument: intelligence means finding solutions in unfamiliar situations, not imitating existing ones with varying accuracy.

Andrej Karpathy, one of the best-known AI researchers, went the opposite direction. In fall 2025, he still said agents didn't work. Then GPT-5.4 and Opus 4.6 shipped in December, and he reversed course: AI agents had changed programming forever. Days ago, Karpathy joined Anthropic, leaving his startup behind. He expects "transformative years" ahead.

In a recent podcast, he doubles down. Anyone who uses AI agents the right way can boost their productivity by far more than 10x, he says.

But Karpathy also confirms Hotz's concerns about code quality: "When you actually look at the code, sometimes I get a little bit of a heart attack, because it's not like super amazing code necessarily all the time. It's very bloaty, there's a lot of copy paste, there's awkward abstractions that are brittle, and like, it works, but it's just really gross." Planning and understanding still need human expertise, according to Karpathy.

An OpenAI developer known by the pseudonym "roon" backed Hotz's concerns earlier this year and addressed them in a somewhat unusual way: AI will make mistakes, he said, even dramatic enough to take down entire systems. Those bugs will be difficult to find, but they'll still get fixed eventually. Developers will soon stop reviewing their code by hand, he said.

AI News Without the Hype – Curated by Humans