# 美国政府基准测试称中国在AI竞赛中落后，但独立数据并不支持

- 来源：The Decoder：AI News（RSS）
- 作者：Matthias Bastian
- 发布时间：2026-05-03 16:12
- AIHOT 分数：54
- AIHOT 链接：https://aihot.virxact.com/items/cmopi471b0wncsll9tz6fj700
- 原文链接：https://the-decoder.com/china-is-falling-behind-in-the-ai-race-according-to-a-us-government-benchmark

## AI 摘要

美国政府机构评估称中国在人工智能竞赛中落后八个月，但独立数据并未证实这一差距。当前美国实验室持续追求更智能的模型，而中国玩家如深度求索（Deepseek）等提供的价格优势可能成为更关键的竞争筹码。这场竞赛的衡量标准正从单纯的技术指标扩展到包括成本效益在内的综合维度。

## 正文

China is falling behind in the AI race, according to a US government benchmark

A new report from the Center for AI Standards and Innovation (CAISI) claims Chinese AI models are losing ground to their US counterparts.

The agency recently put the new Chinese open-weight model Deepseek V4 Pro through its paces. The verdict: it's roughly eight months behind the leading US models. CAISI tested performance across cybersecurity, software development, math, natural sciences, and abstract reasoning.

CAISI calls Deepseek V4 the most capable Chinese AI model to date. But in private testing, it reportedly performs worse than Deepseek's own technical report suggests. Deepseek pitches the model as roughly on par with current US models like Opus 4.6 and GPT-5.4. CAISI says it's actually closer to the older GPT-5 - especially on abstract reasoning, cybersecurity, and software development. Math is the one area where Deepseek V4 nearly matches the top US models.

The center, which likely has its own political agenda, sits within the National Institute of Standards and Technology (NIST). Its report paints a picture of a widening gap between US and Chinese models. Independent measurements tell a different story, showing the gap has stayed roughly constant.

Price might start to matter more than raw capability

On price, Deepseek V4 has a clear edge. It came in cheaper than the comparable GPT-5.4 mini in five of seven tests. And price is becoming a bigger factor as AI models are expected to run longer and handle more complex tasks. Meanwhile, top-tier US models keep getting pricier.

That matters because no one really knows yet how much these models actually boost productivity. Businesses don't have reliable ways to measure return on investment, especially once you factor in downstream effects like training, upskilling, and error checking.

Past a certain capability threshold, "good enough" performance at a low price could end up more attractive than top-tier performance at premium rates. Cursor, the Claude Code competitor reportedly being acquired by SpaceX, built its custom fine-tuned coding model on top of a Chinese open-weight model, making it significantly cheaper than what OpenAI and Anthropic offer.

OpenAI CEO Sam Altman seems torn on this. In a recent post on X, he wrote: "I keep thinking I want the models to be cheaper/faster more than I want them to be smarter, but it seems that just being smarter is still the most important thing."

Altman's view may also rest on the bet that smarter AI could help improve itself, speeding up progress across the board. OpenAI, Anthropic, and Chinese developers have all said recently that their own models are already accelerating their R&D work.

AI News Without the Hype – Curated by Humans
