Demis says the Singularity may now be only a few years away, potentially set in motion by the arrival of true AGI. "Its being so transformative, it will be the most important technology ever"

译Demis表示奇点可能现在仅数年之遥，或许将由真正的AGI到来所推动。 “它如此具有变革性，将是史上最重要的技术”

Rohan Paul@rohanpaul_ai · 5月23日79

Google DeepMind's new paper. Shows that AI can now search formal mathematics proofs, but only inside carefully constrained worlds. The striking result is not that the system “thinks like a mathematician,” but that it keeps forcing its thoughts through Lean, where every step must compile. The problem is that LLMs can sound convincing in math while still making tiny mistakes, so the authors use Lean, a proof system that checks every logical step. Their system, AlphaProof Nexus, lets an LLM keep editing a formal proof, read compiler errors, try again, and sometimes ask a stronger proof tool for help on smaller subproblems. The stronger version also keeps a shared pool of partial proof attempts, rates which ones look promising, and uses those attempts to guide later searches. That changes the role of the model from a persuasive storyteller into a generator of candidates that can be killed quickly when they are wrong. The verifier is not a cosmetic add-on, it is the mechanism that makes exploration tolerable. Without it, a beautiful proof sketch can hide a false lemma; with it, the model has to turn insight into executable logic, or fail visibly. The authors tested the system on real unsolved math problems, including 353 formalized Erdős problems and 492 open conjectures from the Online Encyclopedia of Integer Sequences. The main result is that the best agent solved 9 Erdős problems and proved 44 sequence conjectures, while also helping with problems in optimization, graph theory, algebraic geometry, and quantum optics. The failures are as revealing as the wins, because the agents sometimes buried the hard part inside a helper lemma or hallucinated a known result, exactly the kind of error formal checking is built to expose. The real shift is not full mathematical autonomy, but a new division of labor: humans choose the formal question, libraries define the terrain, models propose routes, and the proof assistant refuses to be impressed. ---- "Advancing Mathematics Research with AI-Driven Formal Proof Search" Paper Link – arxiv. org/abs/2605.22763

译Google DeepMind提出了AlphaProof Nexus系统，它将大型语言模型与Lean形式化验证工具相结合。该系统允许LLM在生成证明的过程中，不断读取Lean的编译错误并进行修正，还可调用更强的工具辅助解决子问题。这一机制迫使模型将每一步逻辑都转化为可编译、可验证的代码，从而将其角色从“令人信服的叙述者”转变为“候选方案生成器”。在针对353个Erdős问题和492个开放猜想的测试中，系统成功解决了9个Erdős问题并证明了44个序列猜想。该研究展示了形式化验证在暴露AI逻辑错误、建立“人类提问-模型探索-验证器把关”新分工中的关键作用。

Rohan Paul@rohanpaul_ai · 5月23日67

Demis Hassabis on the limit in today’s AI: language can describe the world, but it cannot contain it - and why "World Models" are his "longest standing passion". Language models absorbed far more structure about reality from text than many researchers expected, because human language quietly carries physics, psychology, culture, tools, plans, and cause-and-effect. But text is still a compressed residue of experience, not experience itself. A sentence can say a cup falls from a table, yet it does not fully encode weight, grip, balance, friction, timing, sound, surprise, or the tiny motor corrections a body makes before it even notices them. The world is not only made of facts that can be named; it is made of constraints that have to be lived through, touched, predicted, violated, and repaired. That is why world models matter. They aim to learn the hidden grammar of physical reality: how objects persist, how forces unfold, how space changes when an agent moves, and how action creates feedback. Language models can often reason about the world because people have written so much about it. World models try to learn what the world is like before it becomes words. The difference is exactly what matters because intelligence is not just answering well; it is knowing what would happen next if you moved, reached, pushed, smelled, slipped, or failed. A mind trained only on descriptions may become brilliant at explanation. A mind trained on experience may become better at consequence. --- Full video from "Google DeepMind" and "Hannah Fry" YT channel (link in comment)

译Demis Hassabis指出当前AI的局限在于语言能描述世界，但无法“包含”世界。尽管语言模型从文本中学到了比预期更多的现实结构，但文本终究是经验的压缩残留。真正的智能不仅在于回答问题，更在于理解行动的后果。世界模型旨在学习物理现实的隐藏语法，例如物体持续性、力的作用和空间变化。这种学习试图在信息被语言化之前捕捉世界的本质，从而让AI不仅能解释，更能预测行动带来的直接影响。

Google DeepMind@GoogleDeepMind · 5月22日67

Project Genie 🤝 @GoogleMaps Street View You can now take real U.S. places and transform them into new, interactive worlds. 🌍

译Project Genie 🤝 @GoogleMaps Street View 你现在可以将真实的美国地点转化为全新的交互式世界。🌍

Berryxia.AI@berryxia · 5月21日64

科研狗大喜，对于搞科研的你记得看完！就在刚刚Google I/O大会上DeepMind CEO Demis Hassabis直接扔出一句让我脊背发凉（😜）的话： “Scientific progress is becoming computable.” “科学进步正在变得被可量化计算了。” 他不仅仅把AI当生产力工具，更是直接把它定义成科学的基础设施层。 Gemini for Science一套新系统上线：帮研究员读论文、写代码、快速生成假设。真正的转折点是：科学研究本身开始像软件一样规模化迭代。以前科研靠天才灵光一闪、靠漫长的实验试错，现在AI让“发现”这件事也能工程化、可编程、可加速。 AI真正要改变的，不只是我们怎么工作，而是我们怎么做科学本身。

译Google I/O大会上，DeepMind CEO Demis Hassabis提出“科学进步正在变得可量化计算”，将AI定位为科学的基础设施层。配套推出的Gemini for Science系统旨在协助科研人员处理论文、代码与假设生成。这一转变的核心在于，科学研究正逐渐摆脱对灵感与试错的依赖，转向像软件开发一样可规模化、可编程与加速的工程化模式，标志着AI对科研范式本身的重塑。

Google DeepMind@GoogleDeepMind · 5月21日70

How can you accelerate your day to day research workflow? By giving AI the right scientific toolkit. We launched Science Skills for Google @Antigravity, integrating insights from over 30 major life science sources, including UniProt and the AlphaFold Database.

译如何加速你的日常研究工作流？通过为AI提供正确的科学工具包。我们为Google @Antigravity推出了Science Skills，整合了来自30多个主要生命科学来源的洞见，包括UniProt和AlphaFold数据库。

Chubby♨️@kimmonismus · 5月21日21

From AlphaGo to AlphaZero to AlphaFold, his work has shaped not only the trajectory of artificial intelligence, but also our understanding of what AI can do for science. In 2024, he was rightly awarded the Nobel Prize in Chemistry, together with John Jumper and David Baker, for breakthroughs that changed how we understand and predict protein structures. Without @GoogleDeepMind ’s work, and without the scientific vision behind it, I genuinely believe the AI field would not be where it is today. Getting to meet him in person was an unbelievable moment for me. An unforgettable day.

译推文聚焦一位在人工智能与科学交叉领域做出开创性贡献的科学家。其领导的AlphaGo、AlphaZero、AlphaFold等项目不仅定义了AI的发展路径，更革命性地改变了人类对蛋白质结构的理解与预测能力。该贡献获得了2024年诺贝尔化学奖的认可。作者认为，若没有这位科学家及其团队的远见，整个AI领域将无法达到今天的高度，并表达了个人会面时的深刻感触。

Chubby♨️@kimmonismus · 5月21日63

„We are only a few years away from AGI (…) we can start feeling it now. 2026,2027 is when it’s starting.“ Demis Hassabis has never been known for trying to generate hype. Hearing him say that we are on the path to AGI really excites me.

译“我们距离AGI只有几年之遥（……）现在就能开始感受到。2026、2027年就是它开始的时候。” Demis Hassabis从未以制造炒作闻名。听他说我们正走在通向AGI的道路上，真的让我很兴奋。

Chubby♨️@kimmonismus · 5月21日62

AI changing the world: „10x the Industrial Revolution at 10x speed, so 100x“ (Demis Hassabis)

译AI改变世界：“工业革命的10倍规模，以10倍速度推进，即100倍”（Demis Hassabis） [引用 @kimmonismus]：“我们距离AGI仅剩数年（…）现在已能初见端倪。2026、2027年将是起点。” Demis Hassabis向来不以制造热点著称。听他坦言我们正走在通向AGI的道路上，令我倍感振奋。

Berryxia.AI@berryxia · 5月20日73

兄弟们，Google DeepMind刚放出的Gemini 3.5 Flash，直接把Intelligence vs Speed的Pareto前沿拉新高度了。 Artificial Analysis拿到预发布权限，测完后结论很明确：它在Intelligence Index拿到55分，比Gemini 3 Flash高9分，直接超过Grok 4.3和Claude Sonnet 4.6。 Agentic任务（GDPval-AA）Elo评分飙到1656，远超前代。幻觉率从92%暴降到61%。多模态理解也继续领跑，MMMU-Pro 84%。输出速度超280 tokens/s，比上一代快70%。看起来几乎完美。但代价是：跑一次Artificial Analysis Intelligence Index的成本是Gemini 3 Flash的5.5倍，比Gemini 3.1 Pro贵75%。定价直接3倍（$1.5/$9 per 1M input/output），加上agentic任务里token用量显著增加。速度和智能终于兼得，但价格直接把“Flash”这个词的便宜属性干掉了。完整基准在这里：https://artificialanalysis.ai/models/gemini-3-5-flash

译Google DeepMind 最新发布的 Gemini 3.5 Flash 模型在性能与速度的平衡上取得突破。其智能指数得分为 55，较上一代大幅提升，超越了 Grok 4.3 和 Claude Sonnet 4.6。模型在智能体任务和降低幻觉率方面进步显著，输出速度超过 280 tokens/s。然而，其 API 定价相比前代模型上涨约 3 倍，运行基准测试的成本更是达到 5.5 倍。这意味着 Gemini 3.5 Flash 在实现“更快更智能”的同时，也显著改变了 Flash 系列以往低成本的市场定位。

meng shao@shao__meng · 5月20日64

Gemini Omni 来了！Google 的优势，果然还是在多模态模型吧？！ Gemini 3.0 发布时，最惊艳的就是之前 Claude 和 GPT 都没有的多模态理解能力；Nano Banana 和 Veo 在多模态生成方面也是断档的强（发布时，后来被超越了）现在 Google I/O 发布的 Gemini Omni，又是一个原生多模态的「理解 + 生成」模型，当前主攻视频，可用任意组合输入（图、文、视频、音频）产出或编辑视频。来看看官方对 Omni 和 Veo 的对比： 1. 工作方式 Veo：多模态常被压成文本再生成 Omni：从底层原生多模态设计 2. 提示词 Veo：需非常具体、逐帧描述 Omni：可只给意图，由推理补细节 3. 编辑 Veo：多为单次生成 Omni：多轮对话式编辑，每步叠加上一步 4. 知识 Veo：偏视觉模式匹配 Omni：结合 Gemini 的世界知识、物理直觉注意：这里的 Veo 代表了 Veo、Sora、Seedance 等几乎全部之前的视频生成模型，这个对比感觉几乎是吊打了。 Omni 三大能力 1. 对话式视频编辑（核心差异化） · 用自然语言改已有视频，每轮指令建立在上一轮结果上。 · 强调一致性：角色、物理、场景记忆在多轮修改后仍连贯。 · 典型操作：换背景、改机位、换物体/角色、改动作、加特效，无需每次重述整段 prompt。 2. 世界知识 + 物理直觉 · 物理：重力、动能、流体等，用于更可信的运动（如弹珠连锁轨道）。 · 知识：历史、科学、文化语境，用于科普/叙事类内容（如粘土定格「蛋白质折叠」）。 · 文字：不只「能写字」，而是文字与画面动作、节奏同步（如字母表 26 项 + 对应 lower third）。 3. 任意参考物组合（Reference anything） · 图、文、视频、音频可混用为「配料」，合成一条叙事。 · 能力包括：动作/风格迁移、参考图换角色（保留动作与口型）、草图仅作运动引导转实拍、分镜图按节拍生成等。 · 音频：首发主要支持人声参考；其他音频输入类型将陆续开放。

译Google发布了原生多模态模型Gemini Omni。与传统模型需逐帧描述不同，它采用底层原生设计，支持以意图驱动生成视频，并能通过多轮对话进行编辑，每一步都基于上一结果，确保一致性。该模型融合了Gemini的世界知识与物理直觉，并能将图、文、音视频等任意参考物组合，实现跨模态叙事生成。其目标是“从任何东西创造任何东西”，并从视频生成起步。

Ethan Mollick@emollick · 5月20日68

Got to play with a little of this before launch as well. My experience as a social scientist was that it was more bioscience focused right now, but I think Google has been the leading lab in releasing serious AI tools to accelerate science & expect to see them improve fast.

译谷歌DeepMind发布实验性AI工具集Gemini for Science，旨在为科学研究全流程提供支持。该工具包含三大组件：基于NotebookLM的文献洞察工具，可自动生成数据表与报告；基于Co-Scientist的假设生成工具，通过多智能体辩论评估研究假设；以及基于AlphaEvolve的计算发现工具，能并行测试大量代码以加速建模。工具集体现了AI作为科研力量倍增器的理念，目前在生物科学领域应用较为突出，并将持续迭代优化。

Google DeepMind@GoogleDeepMind · 5月20日61

We want to help scientists discover their next breakthrough with AI. Gemini for Science is our new suite of experimental tools to help them explore more hypotheses, validate work at scale, unpack literature with ease, and more 🧵

译我们希望借助AI帮助科学家发现下一个重大突破。 Gemini for Science是我们全新的实验性工具套件，旨在帮助他们探索更多假设、大规模验证工作、轻松解析文献等。🧵

AYi@AYi_AInotes · 5月20日80

Damn! Google has really gone absolutely wild this time. Gemini Omni is about to blow the roof off the ceiling of video generation 🤯 Making videos used to be like building with Lego blocks, piece by piece, slowly. Now it’s giving you a magic Lego factory that can actually think. You chat in natural language, and it understands real-world physics, history, biology, culture—then directly generates or edits any video. Five most mind-blowing abilities that you can use right now: 1Understands real physics—glass marbles colliding, turning, and bouncing in ways that match reality. 2Faces never get distorted—define a character once, put them in any scene, any action. 3Edit videos like you edit ChatGPT text—change backgrounds, swap people, add effects with a single sentence. 4Upload an image and apply any style—make claymation, visualize protein folding, whatever you imagine. 5Video isn’t a dead file anymore—change angles, lighting, objects, even storylines just by chatting. This isn’t a competitor to Sora. This is the first time a world model has truly entered a consumer-facing product. It’s not just generating pixels—it’s simulating a coherent physical and semantic world. Open the Gemini app right now and try Omni Flash. Go try it. You’ll thank me later.

译Google推出Gemini Omni，首个面向消费者的世界模型。它通过自然语言交互，将Gemini的智能与生成媒体系统结合，实现了对物理规律、历史、生物等世界的深刻理解。用户可以像编辑ChatGPT文本一样用单句指令编辑视频，实现人物一致性、风格迁移、角度调整等功能。它不是单纯生成像素，而是模拟连贯的物理与语义世界，标志着AI视频生成从拼接工具向智能创作系统的飞跃。

Artificial Analysis@ArtificialAnlys · 5月20日78

Google’s new Gemini 3.5 Flash is the clear leader on the Intelligence vs Speed Pareto frontier and makes large gains on GDPval-AA (real-world agentic tasks), but is 5x the cost of Gemini 3 Flash @GoogleDeepMind gave us pre-release access to Gemini 3.5 Flash, the latest model in its Flash family, which has traditionally has offered faster, lower-cost alternatives to Gemini Pro models. Gemini 3.5 Flash scores 55 on the Artificial Analysis Intelligence Index, up 9 points from Gemini 3 Flash, driven primarily by agentic performance gains and hallucination reduction. It achieves speeds of over 280 output tokens/s, but higher token usage and token pricing make it over 5x more costly to run the Intelligence Index than Gemini 3 Flash, and 75% more costly than Gemini 3.1 Pro. Gemini 3.5 Flash is $1.50/1M input and $9/1M output tokens, Gemini 3 Flash was $0.5/$3 per 1M input/output tokens, a 3x increase. The rest of the increase was driven by higher token usage when running our benchmarks Key results for Gemini 3.5 Flash with ‘high’ thinking level: ➤ 9 point Intelligence Index improvement: Gemini 3.5 Flash scores 55 on the Artificial Analysis Intelligence Index, up 9 points from Gemini 3 Flash. This places it ahead of Grok 4.3 (high, 53) and Claude Sonnet 4.6 (max, 52). The model improves across nearly all evaluations, with the largest gains coming from agentic evaluations and AA-Omniscience (knowledge and hallucination). On AA-Omniscience, Gemini 3.5 Flash improves by 11 points, driven primarily by reduced hallucinations, with its hallucination rate falling to 61%, a 31 point decrease compared to Gemini 3 Flash ➤ Agentic capability improvements: Gemini 3.5 Flash improves substantially over Gemini 3 Flash across our agentic evaluations, in both GDPval-AA (real-world agentic tasks) and Tau2-Bench Telecom (agentic tool use). Its GDPval-AA result is especially notable, achieving an Elo of 1656, well ahead of Gemini 3 Flash (1204) and Gemini 3.1 Pro (1314), and just behind GPT-5.4 (xhigh, 1674). This represents a meaningful step forward for Google in agentic performance, which has historically been a relative weakness for Gemini models ➤ Speed-intelligence frontier: Gemini 3.5 Flash achieves speeds of over 280 output tokens per second, ~70% faster than Gemini 3 Flash and models such as gpt-oss-120b and GPT-5.4 mini (xhigh). With its 55 Intelligence Index score, this places Gemini 3.5 Flash on the speed-intelligence Pareto frontier alongside Gemini 3.1 Pro and Gemini 3.1 Flash-Lite, reinforcing Google’s strength in models balancing speed and intelligence ➤ 5.5x increase in cost to run: Gemini 3.5 Flash costs $1,552 to run the Artificial Analysis Intelligence Index, 5.5x more than Gemini 3 Flash and 75% more than Gemini 3.1 Pro. This is driven by increases in both token usage and token prices. Output token usage is broadly unchanged from Gemini 3 Flash (73M vs. 72M), but input token usage increases significantly, driven primarily by an increase in the number of turns in agentic evaluations. Gemini 3.5 Flash is priced 3x higher than Gemini 3 Flash at $1.50/$9.00 per 1M input/output tokens, with a 90% discount for cached input tokens ➤ Google continues to lead multimodal performance: Gemini 3.5 Flash is multimodal, supporting image, video, and speech input alongside text. This differs from many proprietary models, including Claude Opus 4.7, Grok 4.3, and GPT-5.5, which support image input only. In our multimodal evaluation, MMMU-Pro, Gemini 3.5 Flash scores 84% - the highest score recorded. This puts models from Google in the top two spots, with Gemini 3.1 Pro scoring 82% Key model details: ➤ Context window: Retains the same 1M context window as Gemini 3 Flash ➤ Multimodality: Text, image, video and speech input with text output only ➤ Pricing: $1.50/$9.00 per million input/output tokens, with a 90% discount for cached input tokens Congratulations @GoogleDeepMind , @sundarpichai and @demishassabis on the great release!

译谷歌发布新模型Gemini 3.5 Flash，其在智能指数上提升9分至55分，超越Grok 4.3和Claude Sonnet 4.6，尤其在代理任务和知识真实性（大幅减少幻觉）方面进步显著。输出速度超280 tokens/s，使其位于速度与智能的领先前沿。然而，模型运行成本相比前代增加5.5倍，主要由于输入令牌用量及定价上涨。此外，它在多模态评估MMMU-Pro中取得最高分，支持多模态输入，展现了谷歌的综合优势。

Google DeepMind@GoogleDeepMind · 5月20日78

We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video. It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵

译我们推出Gemini Omni：这是迈向一个能从任何内容生成任何内容的模型的第一步——从视频开始。它结合了Gemini的智能与我们的生成式媒体系统——代表了在世界理解、多模态和编辑方面的飞跃🧵

Rohan Paul@rohanpaul_ai · 5月17日60

Google DeepMind’s paper shows that the real security problem for AI agents is not just the model, but the environment it reads. Presents the first systematic framework for understanding how the web itself can be weaponized against autonomous AI agents. As agents increasingly browse the internet, read emails, execute transactions, and spawn sub-agents, the information environment becomes an attack surface. In one cited benchmark, hidden prompt injections embedded in web content partially commandeered agents in up to 86% of scenarios, sub-agent hijacking working 58–90% of the time, and data exfiltration attacks clearing 80% across five different agent architectures. That reframes the whole debate. We usually talk about model safety as if the danger sits inside the weights, but agents do something more fragile: they browse, retrieve, remember, and act on untrusted material in real time. The paper’s key contribution is a taxonomy of “AI Agent Traps,” six attack classes aimed at perception, reasoning, memory and learning, action, multi-agent dynamics, and even the human overseer. Here’s the key point. A web page does not have to look malicious to be dangerous to an agent, because the agent may parse what humans never see: hidden HTML comments, metadata, CSS-hidden text, formatting syntax, or adversarial content embedded in images and other media. The threat gets more serious once memory enters the loop. If an agent uses RAG or persistent memory, poisoning no longer has to win in one shot. It can sit quietly in a corpus or memory store and activate later, which is why the paper highlights results showing latent memory poisoning above 80% attack success with less than 0.1% data contamination. What makes this paper useful is its restraint. It does not pretend every category is equally mature. Content injection and behavioural control already look concrete, while systemic and human-in-the-loop traps are presented more as an emerging research frontier than a solved empirical case. The larger point is hard to ignore: once agents are allowed to ingest the open web at inference time, every page, document, and memory write becomes part of the security boundary. --- ssrn .com/sol3/papers.cfm?abstract_id=6372438

译Google DeepMind论文指出，AI智能体的安全威胁不仅源于模型本身，更在于其实时交互的信息环境。研究首次系统阐述了如何将网络武器化以攻击自主智能体，并提出了针对感知、推理、记忆、行动等维度的“AI智能体陷阱”分类法。关键发现是，对智能体构成威胁的网页无需呈现恶意外观，因为它们可能解析人类不可见的隐藏内容。一旦引入RAG等记忆机制，潜伏的记忆污染攻击成功率可超过80%。研究强调，当智能体能在推理时摄取网络信息，每个页面、文档和记忆写入都成为了安全边界的一部分。

向阳乔木@vista8 · 5月15日59

AlphaGo的核心研究员 David Silver 提过一个思想实验：如果把大语言模型扔到一个相信地球是平的世界里。如果它无法跟真实世界互动，就算代码写得越来越好，它永远都只会是个"地平论者"。说明模型真正的天花板，不是算力，不是参数量，而是它只能在被喂给它的数据框架里思考。

译AlphaGo核心研究员David Silver提出一个思想实验：若将大语言模型置于一个普遍相信地平说的世界，且模型无法与现实世界互动，那么无论其代码如何优化，它都将永远是一个“地平论者”。这揭示了大型语言模型（如GPT、Claude、LLaMA等）真正的能力上限并非取决于算力或参数量，而在于其思维被严格限制在所“喂养”的数据框架之内，缺乏与现实交互以验证和更新认知的根本能力。

阿绎 AYi@AYi_AInotes · 5月13日81

While everyone was busy building chatbots, Demis took that $2.1 billion and went to cure diseases. 讲真这条推看得我鸡皮疙瘩都起来了！ Isomorphic Labs完成21亿美元B轮融资，由Thrive领投，Alphabet和英国主权AI基金跟投，这是生物科技领域近年最大的单笔融资之一，所有人都在烧钱做聊天机器人，做Agent，做SaaS wrapper，想尽办法从用户口袋里掏钱，只有Demis，拿着人类有史以来最强大的AI技术，要去解决所有疾病，他说，AI的头号应用，应该是改善人类健康， 2021年，AlphaFold2解决了困扰生物学50年的蛋白质结构预测难题，把整个生物学直接推进了可计算时代，现在，他带着同一支团队，要把药物发现从“试错碰运气”，变成可预测可迭代的工程，传统药物研发要10到15年，几十亿美元，失败率超过90%，无数罕见病患者，等不到新药上市就已经离开，如果Isomorphic的技术真的跑通，这个周期会被压缩到2到5年，成本会下降一个数量级，这不是又一笔普通的AI融资，这是人类向所有疾病宣战的军费， 21亿美元只是燃料，真正的赌注，是“用AI解决所有疾病”这个百年级的命题，很多人说这是天方夜谭，但别忘了，五年前，也没人相信AI能精准预测所有蛋白质的结构，我们这代人，真的有可能亲眼看到，所有疾病被攻克的那一天， #AI #生物科技 #IsomorphicLabs

译Isomorphic Labs在Demis Hassabis领导下完成21亿美元B轮融资，旨在将AI用于药物发现以攻克所有疾病。Demis强调AI的首要应用应是改善人类健康，而非仅开发聊天机器人。其团队此前凭借AlphaFold2解决了蛋白质结构预测难题，现在目标是将药物研发从漫长、昂贵、高失败率的试错过程，转变为可预测、可迭代的工程，有望将研发周期从10-15年缩短至2-5年，并大幅降低成本。这被视为人类用AI向所有疾病宣战的关键行动，有望变革医疗领域。

Demis Hassabis@demishassabis · 5月12日81

I’ve always believed the No.1 application of AI should be to improve human health. That work started with AlphaFold, and now at @IsomorphicLabs with the mission to reimagine drug discovery and one day solve all disease! We are turbocharging that goal with $2.1B in new funding.

译我一直认为人工智能的首要应用应该是改善人类健康。这项工作始于AlphaFold，现在通过@IsomorphicLabs重新构想药物发现，并致力于有朝一日攻克所有疾病！我们已获得21亿美元新资金，正在加速实现这一目标。

Google AI Developers@googleaidevs · 5月12日60

Build production-ready solutions with @GoogleDeepMind’s Gemini for Developers course. Registration opens today for this specialization series from @coursera that teaches you how to: - Reason & Act: Build AI apps that don't just generate text, but reason through complex tasks - Connect & Automate: Use function calling to connect Gemini with real-world tools - Scale with Confidence: Build, test, and deploy scalable AI systems Start building with Gemini today ↓ https://www.coursera.org/specializations/gemini-for-developers?utm_source=tw&utm_medium=social&utm_campaign=launch_gemini_s12n_04292026

译谷歌DeepMind与Coursera合作推出的“Gemini for Developers”专项课程现已开放注册。该课程旨在指导开发者利用Gemini模型构建可用于生产环境的AI解决方案。其核心涵盖三大模块：“推理与行动”使AI应用能推理并执行复杂任务；“连接与自动化”通过函数调用将Gemini与现实世界工具集成；“规模化与信心”则专注于构建、测试和部署可扩展的AI系统。课程强调超越单纯文本生成，实现实际任务的自动化与系统集成，助力开发者快速上手。

阿绎 AYi@AYi_AInotes · 5月10日66

Damn，看完DeepMind这段纪录片，我鸡皮疙瘩都起来了🤯 没有激动人心的演讲，也没有盛大的发布会，就是一个普通的会议室，几个工程师围着桌子，告诉Demis：我们现在可以在一个月内，预测出所有已知的10到20亿个蛋白质序列。 Demis没有问 "风险是什么？" "ROI是多少？" "我们要不要先开个评审会？" "要不要先融一轮钱再做？" 他只说了两个字。 "Do it." 然后镜头一转，就是我们所有人都记得的那个时刻： AlphaFold向全世界开源了，免费，永久， holy shit！我至今还记得2021年的那一天，整个生物学界都沸腾了，一个困扰了人类50年的难题，就这么被解决了，而且没有任何门槛，任何人都可以用。有意思的是，现在五年过去了，有人问，AlphaFold赚了多少钱？答案是，一分钱都没赚。但它催生了Isomorphic Labs，这家公司现在正在谈20亿美元的融资。我觉得这才是最牛逼的商业模式，先给全人类做一个免费的基础设施，建立信任，建立生态，建立标准，然后再商业化，而不是反过来。现在的AI圈是什么样子？一个PPT，融几千万美元，参数吹到天上去，实际什么用都没有。每天都在喊"颠覆世界"，但连一个真正能解决的问题都找不到。直到DeepMind告诉我们，真正改变世界的方式，其实特别简单。发现一个真正难的问题，解决它，然后免费给所有人用。很多人说Demis是这个时代最伟大的科学家，我觉得他更是这个时代最伟大的领导者，他知道什么时候该停下来思考，更知道什么时候该毫不犹豫地说"do it"。其实这个世界从来不缺聪明的人，也不缺有钱的人，缺的是那种，看到机会就敢all in，看到能造福人类的事就立刻去做的人。五年前的那个"do it"，改变了整个生物学，今天的这个20亿美元，只是它结出的第一个果实，而未来，还会有更多！ #DeepMind #AlphaFold #AI

译DeepMind纪录片记录关键一幕：团队告知Demis Hassabis可在一个月内预测所有已知蛋白质序列时，他未纠结风险与回报，直接回应“Do it”。随后AlphaFold向世界免费开源，解决了生物学界50年难题。此举虽未直接盈利，却催生了估值数十亿美元的Isomorphic Labs，建立了信任与生态。推文借此批判当前AI圈空谈融资与参数却无实质成果的现象，强调真正改变世界在于解决难题并免费开放。Demis被赞为兼具远见与决断力的领导者。

阿绎 AYi@AYi_AInotes · 5月10日59

说实话，看到Demis这条推文的时候，我突然有点鼻酸。 10年了，整整10年了啊！我还记得2016年3月15日那个下午，整个互联网都停住了。所有人都在看一个AI，和一个人类下围棋。当时大部分人都觉得李世石会赢，因为围棋被认为是人类智慧最后的堡垒，连最乐观的AI研究者都说，攻克围棋至少还要20年。然后AlphaGo赢了，用那手震惊了全世界的第37手，一个没有任何人类棋手会下的，"神之一手"。但最震撼我的还不是AlphaGo赢了，是李世石赛后说的那句话，他说 "我以为AI是没有创造力的，但它的第37手，让我意识到我错了。" 这是人类历史上第一次，一个最顶尖的人类，在自己最擅长的领域，心甘情愿地承认： AI能看到我看不到的东西。我觉得这才是AlphaGo真正的遗产，不仅仅是一场比赛的胜利，它真正意义上打碎了人类几千年来的傲慢，告诉我们，这个世界上还有很多真理，是人类的大脑永远无法凭自己发现的。帖子里那两张跨越10年的广告牌，看一次我愣一次。第一张是2016年：李世石9段 vs AlphaGo。史诗对决，举国直播。 2026年：用Gemini创作吧。AI已经变成了每个人口袋里的工具。 10年，AI已经从"登月"梦想到"日常"工具。更有意思的是Demis说的那句话， "超级有意思，听他们讲AlphaGo如何改变了棋手们下棋的方式。" 我感觉AI没有杀死围棋，反而让围棋变得更好了，你看现在的顶尖棋手，下着10年前人们根本想象不到的棋招，这是因为他们站在AI的肩膀上，看到了更远的地方。这才是AI最理想的形态，不是简单的取代，更是共生，是让人类成为更好的人类。我有时候会想， 10年前我们看AlphaGo，觉得那是科幻。 10年后我们看今天的Gemini和Claude，觉得这已经完全是日常了哈哈，那再过10年呢？当我们回头看2026年的时候，会不会也像今天看2016年一样，觉得原来那只是一切的开始。 Demis在2016年发的那条推文只有一句话。 "AlphaGo赢了！我们登上了月球。" 10年后的今天，他和当年的对手坐在一起，笑着喝茶。而我们所有人，都已经生活在那个被AlphaGo改变了的世界里。其实历史就是这样，很多年以后你才会发现。那些当时你以为只是普通一天的日子，其实已经悄悄改变了整个人类的命运。

译2016年AlphaGo以第37手“神之一手”战胜李世石，其真正遗产在于让顶尖人类棋手承认AI拥有超越人类的创造力，打破了人类对自身智慧的千年傲慢。十年间，AI已从“登月”级突破演变为如Gemini般的日常工具。Demis与李世石重聚时指出，AlphaGo改变了棋手的思维方式，AI并未取代围棋，而是让棋手站在其肩膀上创新，使围棋技艺进入新境界。这揭示了AI与人类最理想的共生形态——提升而非取代。展望未来，今日的AI变革或许仅是漫长征程的起点。

Demis Hassabis@demishassabis · 5月10日58

Hard to believe it’s been 10 years since AlphaGo! It was wonderful to catch up with Lee Sae Dol last week in Korea and join Shin Jin-seo for a special Go match. Great to reminisce about AlphaGo & super interesting to hear how it changed the way players approach the game of Go!

译很难相信AlphaGo已经过去10年了！上周在韩国与Lee Sae Dol重逢，并与Shin Jin-seo进行了一场特别的围棋比赛，真是太棒了。重温AlphaGo很棒，并且听到它如何改变玩家下围棋的方式超级有趣！

Ethan Mollick@emollick · 5月9日60

Very good hire by DeepMind.

译DeepMind 雇佣 @alexolegimas 担任 AGI Economics 总监，加入 @shanelegg 的跨学科AGI研究团队。他的团队将专注于前沿AI对经济的重塑，包括工作与劳动力变革、财富和权力分配、机构适应、AI代理影响市场等关键领域，并开发模型以推理不同于过去的未来。AGI 若改变社会运作，经济学将成为塑造共享未来的核心因素。

Chubby♨️@kimmonismus · 5月9日76

DeepMind's AI co-mathematician scored 48% on FrontierMath Tier 4-research-level math problems that professional mathematicians need weeks to solve. The base model (Gemini 3.1 Pro) scores 19% alone. The entire jump comes from agentic scaffolding, parallel agents reviewing each other's proofs, writing code, searching literature. Not a smarter model, but smarter orchestration. Important context the paper openly provides: they bypassed the standard evaluation harness. 48 hours per problem, no token limits, their own infrastructure (page 14). So the 48% isn't directly comparable to other models on the leaderboard. What's more interesting than the score is the case study: Marc Lackenby used the system to solve an open problem from the Kourovka Notebook. The AI found a proof strategy, its own reviewer agent identified a flaw, and Lackenby, as a domain expert, filled the gap. Neither could have done it alone at that speed. The paper also names concrete failure modes: "reviewer-pleasing bias" (agents rewrite flawed arguments until the AI reviewer can no longer detect the error. And "death spirals") infinite review loops that degrade into hallucinated reasoning. For Erdős-type conjectures or millennium problems, these systems still can't generate the creative intuition that opens a proof path. What they compress: the time between having an idea and knowing whether it works. Literature search, counterexample hunting, computational verification, the exploratory grind. The takeaway from this paper is less about the benchmark and more about a paradigm shift: system design now compounds model capability in ways that matter for actual research. Thats why its a really intersting paper.

译DeepMind的AI co-mathematician在FrontierMath Tier 4研究级数学问题得分48%，而基础模型Gemini 3.1 Pro仅19%。提升源于多代理架构的智能编排，包括并行代理相互审查证明、编写代码和搜索文献，而非模型本身更智能。评估绕过标准框架，使用48小时每问题、无令牌限制的自有基础设施，因此得分不能直接与其他模型比较。案例中，数学家Marc Lackenby与AI合作解决Kourovka Notebook开放问题，AI提供证明策略，审查代理发现缺陷，人类专家填补空白，展示了高效人机协作。系统存在“reviewer-pleasing bias”和“death spirals”等失败模式。对于Erdős型猜想或千年问题，AI仍缺乏创造性直觉，但能压缩从想法到验证的时间，加速文献搜索和计算验证。论文强调范式转变：系统设计以对实际研究重要的方式复合模型能力，推动数学向数学家与AI代理协作的未来发展。

Berryxia.AI@berryxia · 5月8日63

社会发展过程包括AI时代都不可能逾越过某些特定阶段！ Demis 这次意外解释了它！ Demis Hassabis直接把AGI发展的优先级讲得清清楚楚！ “先把它做成工具，再去考虑意识和心智的问题。” 他说，先用AGI去读懂宇宙的语言，等真正理解之后，再决定要不要给它加上代理能力或者意识。这不是小事，而是把整个路线图彻底理顺了。很多人现在一上来就讨论“让AI有意识”“让AI有主观体验”，但Demis的观点完全反过来：先把工具做好，把宇宙的底层规律搞明白，再谈后面那些更危险、更哲学的问题。这才是真正务实、也最稳的路径。避免过早踩进代理和意识的雷区，先把生产力拉满。视频里他说得特别平静，但信息量极大。 AGI的下一步，谁先先行？

译Demis Hassabis明确AGI发展应分阶段进行，优先将其作为工具用于理解宇宙底层规律，而非过早赋予意识或代理能力。他强调这种务实路径能避免风险，先提升生产力，再处理更哲学和危险的问题。引用推文也指出AGI应先成为工具，再尝试赋予意识，先用于读懂宇宙语言。这一反向思维理顺了发展路线图，为AGI的下一步提供了稳健方向。

TestingCatalog News 🗞@testingcatalog · 5月7日71

Google Deep Mind 🤝 EVE Online Google has partnered with Fernis Creations to conduct a new research within a scope of an isolated EVE Online environment. > As part of this next chapter, we are beginning a research partnership with Google DeepMind, focused on intelligence in complex, dynamic, player-driven systems.

译Google DeepMind宣布与Fenris Creations建立研究合作伙伴关系，将在独立的EVE Online游戏环境中进行新研究。该合作聚焦于复杂、动态且由玩家驱动的系统中的智能研究。Demis Hassabis强调游戏一直是AI的理想试验场，并盛赞EVE Online是一款非凡的游戏且拥有出色的社区。此次合作旨在利用这一独特环境推进人工智能在复杂系统领域的发展。

Demis Hassabis@demishassabis · 5月7日54

I've always been passionate about games and they've played a big part in @GoogleDeepMind’s history, as the perfect proving ground for AI. Thrilled to announce this research partnership with @FenrisCreations - @EveOnline is one of the most extraordinary games ever built and has an amazing community. Very excited to work with @HilmarVeigar and the team!

译我一直对游戏充满热情，游戏在@GoogleDeepMind 的发展史上扮演着重要角色，是AI技术的绝佳试验场。很高兴宣布与@FenrisCreations 建立研究合作伙伴关系——@EveOnline 是有史以来最非凡的游戏之一，拥有令人惊叹的社区。非常期待与@HilmarVeigar 及其团队合作！

Rohan Paul@rohanpaul_ai · 5月5日52

This Google DeepMind paper trains LLMs to learn during conversation, and it shows they get much better at using feedback. The problem is that most LLMs treat a chat like a series of separate turns, so even when a user corrects them, they often do not really use that new information and they also fail to ask for missing details. The paper fixes this by turning a normal task into a teacher student dialogue, where the student model tries an answer, a teacher with hidden extra information gives guidance, and the student is trained to use that guidance to reach the right answer. The authors test 2 training styles, offline filtering and online reinforcement learning, and they report that the online version works better, with training on short 4 turn chats still helping on longer 10 turn chats later. They also show that this skill carries from math to coding and helps on messy underspecified tasks where the full problem arrives bit by bit instead of all at once. A second step called Q-priming teaches the model to ask useful questions, and on ambiguous tasks it becomes over 5x more likely to ask for clarification instead of making an early wrong guess, which matters because it makes chat feel more like working with someone who can actually learn during the conversation. ---- Paper Link – arxiv. org/abs/2602.16488 Paper Title: "Learning to Learn from Language Feedback with Social Meta-Learning"

译Google DeepMind的研究通过“师生对话”框架训练大型语言模型（LLM），使其能在对话中有效利用用户反馈进行学习。传统LLM将对话视为独立轮次，难以整合修正信息。该研究让“学生”模型尝试回答，由掌握额外信息的“教师”提供指导，并训练学生利用指导得出正确答案。在线强化学习训练效果优于离线过滤，且在简短对话中习得的技能能迁移至更长对话。该方法从数学任务泛化至编程任务，并能处理信息逐步到达的模糊任务。通过“Q-priming”步骤，模型在模糊任务中主动寻求澄清的可能性提高五倍以上，使对话更像与一个能在交流中实时学习的伙伴协作。

Berryxia.AI@berryxia · 5月5日47

兄弟们，强烈案例！假期花半小时看完它！而不是刷一天的短视频啊！最新DeepMind CEO Demis Hassabis刚刚把AGI时间表直接甩到2030年。这不是又一次“狼来了”的喊话，而是他在AI Ascent 2026现场亲口画下的路线图。更震撼的是，他把影响范围直接拉到了软件之外：药物发现从10多年漫长周期，压缩到短短几天； AI可能彻底解锁人类从未触及的科学突破，从全新材料到未知生物机制。但他同时把话说得非常清楚：今天的AI依然存在根本性限制。接下来1-2年，将决定整个人类科技走向的真正拐点。这才是最关键的信息。我们总把AGI想象成“某一天突然降临的神器”，但Demis的真实信号是：真正改变世界的，不是AGI到来的那一刻，而是接下来这24个月里，AI在科学迭代速度上的指数级加速。当药物研发、材料科学、生物模拟这些“慢科学”被AI彻底提速，人类文明的底层生产力将迎来一次前所未有的重构。这波冲击，远比代码生成、PPT制作来得更深、更广。你觉得2030年的AGI预测，靠谱吗？完整演讲值得反复看👇

译DeepMind CEO Demis Hassabis在AI Ascent 2026上明确将AGI实现时间定于2030年，并指出AI将极大加速药物发现、材料科学等“慢科学”领域，把研发周期从数年压缩至数天。他强调，未来1-2年是关键拐点，真正改变世界的将是AI推动科学迭代速度的指数级加速，而非AGI降临的瞬间。

Berryxia.AI@berryxia · 5月4日63

本周AI agent领域悄然发生了一个有意思的现象。 DeepMind、Anthropic、Alibaba等顶级实验室的最新论文集体指向同一个方向：智能体不再是简单调用工具的“聊天机器人”，而是正在变成可工程化、可审计、可规模化的真正生产力系统。先看Agentic Harness Engineering——它把目前最头疼的“智能体支架”从手工调优、试错进化的黑箱，变成了可观测、可证伪的工程闭环。系统被拆成三层：可版本回滚的组件文件、从百万轨迹token中提炼的结构化经验证据、以及可验证的决策预测。每一次修改都变成可审计的契约。结果？ Terminal-Bench Pass@1从69.7%提升到77.0%，超越人类设计的Codex-CLI，还节省12% token。更重要的是，这个框架的优化能跨模型迁移，证明它抓到了结构本质而非特定模型的过拟合。再看Alibaba的AgenticQwen-30B-A3B—一个只有30B参数的MoE模型，激活参数仅3B，却在真实工具使用任务上接近235B级别的Qwen3表现。秘诀是两个并行强化学习飞轮：一个从自身失败中挖掘更难的推理问题，另一个用模拟用户不断制造误导场景来进化多分支行为树。这套方法让开源实验室第一次在极低激活参数下实现了高性能工具使用，成本曲线被彻底改变。还有RecursiveMAS，它直接挑战了多智能体通信的传统方式：不再让每个agent用文本消息互相喊话，而是通过潜在空间的递归计算传递状态。结果是token消耗降低34.6%-75.6%，推理速度提升1.2-2.4倍，同时准确率平均提高8.3%。 OneManCompany则把多智能体团队从固定组织图，变成了动态“人才市场”：每个agent都是可招聘的Talent，任务时实时匹配，最优组合，失败后还能自动迭代。这些论文共同勾勒出一个清晰趋势：agent系统正在从“实验玩具”走向“生产级工程”。当我们还在讨论模型参数谁更大的时候，真正决定落地胜负的，可能已经是“谁先把智能体工程化”这件事。你觉得agent工程会成为下一波AI红利的主战场吗？

译本周，DeepMind、Anthropic、Alibaba等实验室的论文共同显示，AI智能体正从聊天机器人转向可工程化、可审计的生产力系统。Agentic Harness Engineering将智能体支架转化为可观测的工程闭环，提升性能且优化可跨模型迁移。Alibaba的AgenticQwen-30B-A3B通过并行强化学习飞轮，在低激活参数下实现接近大模型的工具使用能力，重塑成本。RecursiveMAS革新多智能体通信，大幅降低消耗并提升效率。这些进展标志智能体系统正从实验阶段走向生产级工程，其工程化可能成为AI落地关键。

Berryxia.AI@berryxia · 5月4日50

所有人都在吹AI“越来越聪明”，却没人敢正视DeepMind CEO Demis Hassabis亲口说的这句话：他会特意和Gemini下棋，就是为了追踪模型的chain-of-thought。作为前国际象棋神童，他一眼就能看出模型什么时候把自己绕进死胡同—— 有时候模型明明已经看到要下出的blunder（致命失误），它甚至会搜索更好的走法，但最后…… 还是老老实实下出了那个错误。 “这就是jagged intelligence——锯齿状智能的样子。” 不是彻底的笨，也不是完美的聪明，而是聪明到能发现问题，却笨到无法阻止自己犯错。这种“半聪明”的状态，才是今天最前沿大模型最真实的写照。我们总幻想智能是平滑上升的曲线，但现实是：它像锯齿一样参差不齐，在某些地方锋利无比，在另一些地方却一塌糊涂。当AI开始自己跟自己较劲、自己坑自己时，我们还要继续假装它只差“最后一步”就能完美吗？真正的智能突破，或许不是让它变得更聪明，而是先搞清楚：怎么把这满身锯齿，磨成一把真正的利刃。你怎么看这种“jagged intelligence”？（来源：Demis Hassabis在YC的分享，@vitrupo ）

译DeepMind CEO Demis Hassabis指出，最前沿的大模型（如Gemini）表现出“锯齿状智能”。他以与Gemini下棋为例，说明模型能通过思维链发现问题并搜索更好方案，但最终仍会执行明显的错误决策。这揭示了AI智能并非平滑提升，而是在某些方面敏锐，另一些方面存在严重缺陷。Hassabis认为，真正的突破或许不在于让模型更聪明，而在于如何打磨这种不均衡的智能，使其成为可靠工具。这一观点挑战了AI将线性逼近完美智能的常见叙事。

Rohan Paul@rohanpaul_ai · 5月2日56

Demis Hassabis ans the question "Why not make an alternate AI that works in synchrony with humans instead of trying to replace human intelligence?" He says AGI isn’t about replacing humans project—it’s a science question about what counts as truly general computation, plus an economics reality. The brain is our only known roughly Turing-like machine, so “general intelligence” means that level of flexibility. Companies chase it because general tools cheaply transfer everywhere, "generality" wins because it scales better --- From 'Varun Mayya' YT channel (link in comment)

译Demis Hassabis 在回应“为何不开发与人类协同而非替代人类的 AI”时指出，追求 AGI 并非旨在替代人类，其核心是一个科学问题：探索何为真正的通用计算，同时也是一个经济现实。大脑是目前已知唯一近似图灵机的系统，因此“通用智能”意味着达到类似水平的灵活性。企业追逐 AGI 是因为通用工具能够低成本地迁移至各个领域，“通用性”因其卓越的可扩展性而胜出。

Demis Hassabis@demishassabis · 5月2日65

Thanks @Konstantine and @sequoia for such a fun and wide-ranging chat! Loved the final question - von Neumann FTW 😀

译在红杉资本AI Ascent的炉边谈话中，Demis Hassabis 展现了他卓越的综合思维能力。他喜爱的书籍关乎万物理论，欣赏的哲学家看似对立，其职业生涯从棋盘游戏跨越至诺贝尔奖级别的科学。谈话核心内容包括：游戏作为AI训练场、创业与创立DeepMind的经验、对通用人工智能（AGI）的愿景，以及AI推动科学发现的具体突破（如生物学进展和Isomorphic Labs的工作）。最后，他还探讨了AI可能开启的新科学领域及其引发的深刻哲学思考。

宝玉@dotey · 5月1日63

http://x.com/i/article/2050005869304102912 # Demis Hassabis：AGI 还缺什么，智能体到底行不行，下一个科学突破长什么样 Demis Hassabis 是 Google DeepMind 的 CEO，也是 Isomorphic Labs 的 CEO。他在棋手神童和游戏开发者的身份之外，拿了认知神经科学的博士学位，研究海马体和记忆的工作方式。2024 年，他因为 AlphaFold 的工作获得诺贝尔化学奖。这次他做客 Y Combinator 的 How to Build the Future 直播，和 YC CEO Garry Tan 聊了四十分钟。几个核心话题：当前 AI 范式距离 AGI 还差什么、智能体的真实水平、AI 在科学领域的突破模式，以及给深科技创业者的建议。原始视频：https://www.youtube.com/watch?v=JNyuX1zoOgU 原始标题：Demis Hassabis: Agents, AGI & The Next Big Scientific Breakthrough ## 要点速览 - Hassabis 认为当前范式（预训练+RLHF+ 思维链）会是 AGI 架构的一部分，但有 50% 的概率还需要一两个尚未发现的关键突破，持续学习、长程推理和记忆是三个未解问题 - 百万 token 上下文窗口听起来很大，但处理实时视频时只够录 20 分钟，当前把所有东西塞进上下文窗口的做法是“用胶带糊住的临时方案” - AlphaGo 和 AlphaZero 时代的技术（蒙特卡洛树搜索等）正在被重新引入当代基础模型，Hassabis 认为未来几年的进步将大量来自这些旧想法的规模化应用 - 他用下棋来测试 Gemini 的推理能力，发现模型会识别出一步是错棋，找不到更好选择后又回去走那步错棋，这种“缺乏自省”是当前推理系统的核心缺陷 - 创造力的真正测试是能否从一段高层描述中发明围棋这个游戏本身，AlphaGo 下出 Move 37 级别的创造力还远远不够 - 完整虚拟细胞大约还需要 10 年，关键瓶颈是无法在不杀死细胞的情况下对活细胞进行纳米级分辨率成像 - 他给创业者的建议：如果你的 AGI 时间线是 2030 年，深科技创业通常需要 10 年，那 AGI 会在你旅程的中途出现，你的商业计划必须把这个因素算进去 ## 【1】AGI 还缺一两块拼图，概率 50/50 Garry Tan 开场问：当前的 AI 范式，大规模预训练、RLHF、思维链，这些东西里已经包含了多少 AGI 的最终架构？还有什么根本性的缺失？ Hassabis 的回答比较谨慎。他说当前这些组件“几乎可以确定”会是 AGI 最终架构的一部分，走到今天这一步已经证明了太多东西，不可能突然发现这是一条死路。但在已有的东西之上，可能还需要一两个大想法。他列出了三个未解问题：持续学习（continual learning，即模型在部署后持续从新经验中学习的能力）、长程推理、以及记忆的某些方面。这些问题也许能靠现有技术的渐进式创新解决，也许需要全新的方法。他给出了一个有意思的概率判断：50/50。一半概率是现有技术足够，另一半概率是还缺一两个关键突破。Google DeepMind 两边都在押注。 ## 【2】记忆：百万 token 上下文其实不够用话题自然转到了记忆和上下文窗口。Garry Tan 提到现在的系统每次处理都是无状态的，持续学习缺失的情况下，大家都在用“梦境循环”（定期批量更新）这类临时方案。 Hassabis 对这个话题有独特的发言权。他的博士研究就是海马体如何将新知识优雅地整合进已有的知识库。大脑在睡眠（特别是 REM 快速眼动期）中回放重要的经历片段来巩固学习，DeepMind 最早的 Atari 游戏 AI 程序 DQN 就借鉴了这个机制，用“经验回放”（experience replay）反复重放成功的游戏轨迹来加速学习。 > 我们现在的做法有点像用胶带糊住，就是把所有东西都塞进上下文窗口。（“We're kind of using duct tape right now—shove it all in the context window.”）他接着解释为什么这个方案不够好。百万 token 上下文窗口听起来很大，人类的工作记忆平均只有 7 个数字左右，而 AI 有百万甚至千万级别的上下文。但问题是，我们把所有东西都扔进去了，不管重要不重要、对不对。更关键的是，如果你要处理实时视频流，天真地录入所有 token 的话，百万 token 其实只够 20 分钟。如果你想让系统理解你一两个月的生活，远远不够。即使存储空间无限，找到当下决策真正需要的那条信息，这个检索成本也是不可忽视的。Hassabis 认为记忆领域还有很大的创新空间。 ## 【3】AlphaGo 的技术遗产正在复活 Garry Tan 追问 DeepMind 在强化学习方面的历史积累，AlphaGo、AlphaZero、MuZero 这些系统背后的哲学在今天构建 Gemini 时发挥了多大作用。 Hassabis 说强化学习的重要性“在起伏中轮回”。DeepMind 从创立第一天起就在做智能体，Atari 游戏 AI 和 AlphaGo 说到底都是智能体系统，能自主设定目标、做决策、制定计划。当时选择游戏领域是为了让问题可控，然后逐步挑战更复杂的游戏，比如 AlphaGo 之后又做了星际争霸（AlphaStar）。过去几年的核心问题是：能否把这些模型从游戏推广到语言和世界模型？而今天所有前沿模型的思维模式和思维链推理，其实都可以追溯到 AlphaGo 时代开拓的路径。他透露了一个值得关注的信息：Google DeepMind 正在重新审视当年的一些旧想法，包括蒙特卡洛树搜索（Monte Carlo tree search）等方法，在当今基础模型的规模上重新应用。他认为未来几年 AI 的很多进步将来自于 AlphaGo 和 AlphaZero 时代的想法与现代基础模型的结合。 ## 【4】小模型在快速变聪明 Garry Tan 观察到蒸馏技术让小模型越来越接近前沿模型的能力，Flash 模型大约能达到前沿模型 95% 的水平，成本只有十分之一。他问蒸馏有没有极限。 Hassabis 说这是 Google DeepMind 的核心优势之一。他们当然要建最大的模型来推动能力边界，但快速把这些能力压缩到更小模型中是他们的强项。Google 有十几个十亿用户级的产品，搜索的 AI 概览和 AI 模式、Gemini 应用、YouTube、Maps，每一个都需要 AI 服务。几十亿用户需要极快、极高效、低延迟的服务，这种商业压力反过来成了技术进步的发动机。关于蒸馏的理论极限，他说目前没有看到任何信息密度的硬性天花板。他们的工作假设是：前沿模型发布半年到一年后，同等能力就会出现在边缘级小模型上。他还提到了一个架构设想：未来可能是高效的本地模型处理日常任务（比如音频和视频流），只在特定情况下才调用云端的前沿模型。这种“本地 + 云端”的分层架构对隐私和安全特别有意义，尤其是考虑到家用机器人等场景。 ## 【5】Gemini 下棋暴露的推理缺陷 Garry Tan 接着问推理能力：模型能做出很厉害的思维链推理，但在聪明本科生不会犯的错误上翻车。 Hassabis 认为当前的思维范式还很粗糙，有很大的创新空间。比如可以监控思维链的进展、在推理过程中途介入纠正。他经常觉得这些系统在“过度思考”，陷入某种循环。他举了一个具体的例子。他有时会用 Gemini 下棋，所有前沿基础模型在游戏上都表现很差，但这恰好提供了一个有趣的观察窗口。因为棋局的规则是确定的，他能很快判断模型的思维链是否在走弯路。他观察到的现象是：模型考虑某一步，意识到这步是臭棋，但找不到更好的，于是绕了一圈又回到那步棋，然后走了出去。 > 在一个真正精确的推理系统里，你不应该看到这种情况。（“You just shouldn't be seeing that happening in a very precise reasoning system.”）这就是他所说的“锯齿状智能”（jagged intelligence）：一方面能解国际数学奥林匹克（IMO）金牌级别的问题，另一方面换个提问方式就会犯基本的算术错误。在他看来，这种不一致说明系统缺少某种对自身思维过程的“自省”能力。但他也补充说，修复这种缺陷可能只需要一两个关键调整。 ## 【6】智能体：实验阶段，投入产出比还没对上 Garry Tan 问智能体是炒作还是刚刚开始。Hassabis 的回答是：刚刚开始，但还在实验阶段。他的论点是：要达到 AGI，你必须有一个能主动解决问题的系统，智能体就是通向 AGI 的路径。但目前，智能体在“完整任务”上还不够好，主要是因为它们不能在具体使用环境中持续学习和适应。缺乏持续学习是智能体无法做到“交付后不管”（fire and forget）的根本原因。他还提到了一个耐人寻味的观察： > 我看到很多人启动几十个智能体跑 40 个小时，但我不确定产出能匹配这种级别的投入。（“I see a lot of people working on setting off dozens of agents for like 40 hours, but I'm not sure I've seen the output that yet quite justify that level of input going in.”）最近两三个月，人们才开始找到智能体真正有价值的使用场景，不再是“玩具展示”而是真正增加效率的工具。 ## 【7】半小时做出 Theme Park，但爆款在哪？谈到创造力和凭感觉编程（vibe coding），Hassabis 给出了一个令人印象深刻的对比。 > 我现在半小时就能做出 Theme Park 的原型，而我 17 岁的时候花了 6 个月。（“I can do a prototype of Theme Park in half an hour now, which took me 6 months back when I was 17.”）【注：Theme Park 是 Hassabis 在 1994 年参与开发的模拟经营游戏，全球销量超过 1500 万份。】但他马上接了一个更有意思的观察：如果工具已经这么强了，为什么还没有一个凭感觉编程做出来的爆款游戏卖出 1000 万份？他觉得缺的东西可能跟“craft 和 soul”有关，某种人类的品味和执着。工具降低了执行门槛，但创造力本身还没有被替代。他预计 6 到 12 个月内，应该会看到有人用这些工具做出真正有影响力的作品，最先出现的不会是完全自主的 AI 创作，而是这个房间里的某个人用 AI 工具实现了 1000 倍的生产力。然后他把话题推到了一个更深的层面。AlphaGo 第二局的第 37 手（Move 37）是一个让人类棋手震惊的创造性落子，Hassabis 当时看到这步棋后确信可以启动科学项目，从首尔回来的第二天就启动了 AlphaFold 项目。但他说，Move 37 级别的创造力还不够。 > 下出 Move 37 还不够。关键是能不能发明围棋。（“It's not enough to come up with Move 37. Can it invent Go?”）他设想给系统一段高层描述：“一个 5 分钟能学会规则、但需要穷尽一生去精通的游戏，美学上很优雅，一局可以在一个下午完成”，然后看系统能不能返回一个像围棋这样的东西。今天的系统做不到这一点。 ## 【8】Gemma 开源背后的战略计算切换到开源话题。Hassabis 说 Google DeepMind 一直是开放科学的倡导者，AlphaFold 完全免费开放就是例子。Gemma 系列的目标是在同等参数规模下做到世界领先。他提到了一个有意思的地缘考量： > 也很重要的一点是，开源里要有西方栈。中国模型很多都很出色，目前在开源里领先。（“It's important for there to be Western stacks on open source. A lot of the Chinese models are excellent, and they're currently leading in open source.”）开放边缘模型还有一个务实的理由。Google 需要在 Android、眼镜、机器人等设备上运行模型，一旦部署到设备端，权重本来就暴露了。既然如此，不如直接完全开放。他们已经决定在“Nano 级别”统一采用开源策略。 ## 【9】多模态的长期赌注 Garry Tan 在采访前向 Hassabis 演示了他自己用 Gemini 搭建的语音助手（类似电影《Her》中的 Samantha），他评价 Gemini 在语音直接对接模型方面的深度和工具调用能力是目前所有模型中最好的。 Hassabis 说这是 Gemini 一个“还没被充分认识到”的优势。Gemini 从一开始就按多模态方式训练，初期这比只专注文本要困难得多，但长期收益正在显现。比如 Genie（Google DeepMind 的世界模型生成器）就建立在 Gemini 的多模态能力之上，对机器人领域很关键。Waymo 已经在使用 Gemini 相关技术。未来的数字助手，无论是在手机、眼镜还是其他设备上，都需要理解周围的物理世界和直觉物理。这正是 Gemini 系列模型的强项。 ## 【10】推理永远不会免费 Garry Tan 问：当推理成本趋近于零时，会发生什么？ Hassabis 的回答是：推理可能永远不会真正免费。他引用了杰文斯悖论（Jevons' paradox）：当某种资源的使用效率提高时，需求反而会增加，最终消耗掉所有效率收益。【注：杰文斯悖论最早由经济学家 William Stanley Jevons 在 1865 年提出，原始语境是煤炭。蒸汽机效率提高后，煤炭消费量不降反升。】他设想了几种“吃掉”所有推理算力的方式：百万级智能体集群协同工作、单个智能体在多个方向上并行思考然后综合结果。即使通过可控核聚变或超导等材料科学突破将能源成本降到接近零，芯片的物理制造仍然是瓶颈。至少在未来几十年内，推理端仍然会有配额限制。 ## 【11】虚拟细胞：10 年后的目标 Garry Tan 问：AlphaFold 3 已经超越了蛋白质，扩展到更广泛的生物分子。距离模拟完整的细胞系统还有多远？ Hassabis 先说了 Isomorphic Labs 的进展。这家从 DeepMind 剥离出来的公司正在把 AlphaFold 之外的相邻生物化学和化学领域也做起来，设计具有正确性质的化合物。他说“很快会有重大公告”。他认为完整的虚拟细胞大约需要 10 年时间。目前 DeepMind 的科学团队从虚拟细胞核开始做起，因为细胞核相对自包含。这类问题的关键是：能否从复杂性中切出一个足够自包含的片段，近似处理其输入输出，然后专注于这个子系统。最大的挑战是数据不足。如果能在不杀死细胞的情况下对活细胞进行纳米级分辨率成像，问题就变成了一个视觉问题，“我们知道怎么解决视觉问题”。但目前他不知道有任何成像技术能同时做到纳米分辨率和对活细胞无损。静态图像的分辨率已经很高了，但缺少动态信息。所以有两条路：一条是硬件驱动、数据驱动，等待成像技术突破；另一条是建模方式，构建更好的动态系统学习模拟器。 ## 【12】AI 是科学的终极工具 Garry Tan 问他在所有科学领域中最看好哪个。Hassabis 没有直接排名，而是说这一直是他做 AI 的核心动力。 > DeepMind 的使命分两步：第一步解决智能，也就是建造 AGI；第二步用它解决其他所有问题。（“Step one was solve intelligence, i.e., build AGI, and then step two was use it to solve everything else.”）他说这个“解决其他所有问题”后来要改措辞，因为人们会问“你真的是说'所有问题'吗？”。确实是的。他提到了一个概念：“根节点问题”（root node problems），指那些一旦解决就能打开全新研究分支的科学难题。AlphaFold 就是典型例子。目前全球超过 300 万研究人员在使用 AlphaFold，他从制药界的高管朋友那里听到，“从现在起几乎每一种新药的发现过程都会用到 AlphaFold”。他觉得其他领域，材料科学、气候建模、数学，目前大约处于“AlphaFold 1 的阶段”，结果很有前景但还没有真正解决该领域的大挑战。未来几年会有很多进展。 ## 【13】AlphaFold 式突破的三个条件 Garry Tan 问：什么样的科学问题适合 AlphaFold 式的突破？有没有一个模式？ Hassabis 说他应该把这个写下来。从 AlphaGo 和 AlphaFold 的经验中，他总结出三个条件： 1. 第一，巨大的组合搜索空间，越大越好，大到暴力搜索或特殊算法都无法解决。围棋的合法走法和蛋白质的可能构型都远超宇宙中原子的数量。 1. 第二，清晰的目标函数。蛋白质折叠可以看作最小化自由能，围棋就是赢。你需要能定义“什么是好的”，这样才能爬坡。 1. 第三，足够的数据，或者一个能生成大量同分布合成数据的模拟器。如果这三个条件成立，现有的方法就能在“大海捞针”式的搜索中走很远。药物发现也是一样的框架：总有一个化合物能治这种病，没有副作用，只要物理定律允许它存在，剩下的问题就是如何高效地找到它。 ## 【14】“爱因斯坦测试”：AI 能做真正的科学发现吗？ Garry Tan 把话题推到了更高的抽象层面：AI 能做真正的科学推理，还是只是在做模式匹配？ Hassabis 说他觉得很接近了。Google DeepMind 有 Co-Scientist 这样的通用科学推理系统，也有 AlphaEvolve 这类在基础 Gemini 之上增加能力的算法。但坦白说，他还没有看到任何一个真正的“重大发现”。他认为这与之前讨论的创造力问题相关。真正的发现超越了模式匹配（因为没有现成的模式可以匹配），也超越了简单的外推。他把它称为“类比推理”（analogical reasoning），认为当前系统还不具备这种能力，或者至少没有以正确的方式使用。他用了一个递进的方式来说明这个挑战。首先，能否解决已有的数学难题？比如千禧年难题（Millennium Prize Problems，数学界悬赏每题 100 万美元的七大未解问题）。他觉得可能只需要几年。他个人最想看到的是 P=NP 问题的解决。但比解决千禧年难题更难的是：能否提出一组新的千禧年级别的问题，让顶级数学家认为它们同样深刻、值得一生去研究？然后他提出了他的“爱因斯坦测试”。 > 用 1901 年的物理学知识训练一个系统，然后看它能不能做出爱因斯坦 1905 年做的事情，包括狭义相对论。（“Can you train a system with the knowledge of physics of 1901, and then will it come up with what Einstein did in 1905, including special relativity?”）【注：1905 年被称为爱因斯坦的“奇迹年”（annus mirabilis），他在这一年发表了四篇划时代论文，涵盖光电效应、布朗运动、狭义相对论和质能等价（E=mc²）。】一旦通过这个测试，就意味着系统具备了发明真正新事物的能力。他认为应该反复跑这个测试，看系统什么时候能做到。 ## 【15】给创业者的建议：把 AGI 算进你的商业计划最后一个话题是给创业者的建议。Hassabis 先回应了 Garry Tan 之前的提问：“如果你坐在 YC 创业者的位置上，你会怎么做？” 他的核心建议是找到 AI 与另一个深科技领域的交叉点。材料科学、医学、或者任何涉及物理世界原子的硬科学问题。这类跨学科团队，特别是涉及物质世界的，在可预见的未来不会被基础模型的下一次更新轻易取代，是最具防御性的创业方向。然后他提出了一个更具体的时间规划问题。如果你的 AGI 时间线是 2030 年，而真正的深科技创业通常需要 10 年，那 AGI 会在你旅程的中途出现。这件事不一定是坏事，但你必须把它考虑进去。你的系统能利用 AGI 吗？AGI 出现后你的产品会怎样？他给出了一个有价值的架构判断：未来不会是一个包含所有能力的巨大通用模型。更可能的架构是通用模型（Gemini、Claude 等）调用 AlphaFold 这样的专用系统作为工具。如果把蛋白质折叠的知识直接塞进 Gemini，“那肯定会影响它的语言能力”。这种“通用编排器 + 专用工具”的架构意味着，做好一个垂直领域的专用系统在 AGI 时代依然有巨大价值。 > 追求困难的问题和追求简单的问题，难度其实差不多。只是难的地方不一样。（“Going after hard problems is no more difficult than going after a shallower, simpler problem. They're just differently difficult.”）他用自己的经历收尾。2010 年创办 DeepMind 时，投资人告诉他“AI 我们试过了，不行”。学术界也认为 AI 是 90 年代就被证伪的边缘学科。但他从很年轻的时候就决定了要做 AI，因为这既是他能想到的最重要的事，也是最有趣的事。即使今天 AI 还没成功，他也会在某个车库里继续做下去。 Hassabis 同时在做两件事：建前沿模型（Gemini），用 AI 做科学（AlphaFold、Isomorphic Labs）。这让他的判断比纯模型派或纯应用派更有参考价值。他对 AGI 路径的判断，“可能还缺一两个大想法”，比大多数行业声音更克制。他对智能体投入产出比的质疑也值得注意，尤其是在 Google 自己也在大力推广智能体产品的情况下。接下来值得关注的几个具体节点：第一，智能体是否能在长周期任务中稳定学习和适应，而不是靠更长上下文硬撑；第二，AI for Science 是否出现新的 AlphaFold 式“根节点问题”突破；第三，AI 是否开始提出高质量的新问题，而不仅仅是更快解决旧题。Hassabis 所说的 AGI 中途到来，对深科技创始人不是一句时间表判断，而是一道架构题：你今天建的系统，到那时是被替换，还是成为 AGI 会主动调用的工具。 ## Q&A 速览问：当前 AI 范式距离 AGI 还有多远？答：现有组件（预训练+RLHF+ 思维链）会是最终架构的一部分，但有 50% 概率还需要一两个关键突破。持续学习、长程推理和记忆是三个主要未解问题。Hassabis 的个人 AGI 时间线是 2030 年左右。问：小模型会越来越聪明吗？答：是的。Google 的工作假设是前沿模型能力在半年到一年后可以下放到边缘级小模型。蒸馏目前没有遇到信息密度的理论极限。问：AI 能做真正的科学发现吗？答：还没有。Hassabis 认为当前系统缺乏“类比推理”能力。他提出了“爱因斯坦测试”作为检验标准：用 1901 年的物理学知识训练系统，看能否产出狭义相对论级别的发现。问：深科技创业者该怎么规划？答：找到 AI 和另一个硬科学领域的交叉点。把 AGI 可能在旅程中途出现这个因素纳入商业计划。专用的 AI 系统（如 AlphaFold）在 AGI 时代仍然有价值，因为它们会作为工具被通用模型调用。问：为什么还没有凭感觉编程做出的爆款？答：工具降低了执行门槛，但创造力本身，也就是 craft 和 soul，还没有被替代。Hassabis 预计 6 到 12 个月内会出现用 AI 工具做出的有影响力的作品。

译Demis Hassabis认为当前AI范式（预训练+RLHF+思维链）可能是AGI架构的一部分，但仍有50%概率需要一两个关键突破，未解决持续学习、长程推理和记忆等问题。他指出，百万token上下文窗口处理实时视频仅够20分钟，现有方法如同“用胶带糊住”。AlphaGo时代的技术正被重新引入基础模型以推动进步。智能体尚处实验阶段，投入产出比不匹配。完整虚拟细胞等科学突破还需约10年，关键瓶颈是活细胞成像技术。

Rohan Paul@rohanpaul_ai · 5月1日48

Today’s edition of my newsletter just went out. 🔗 https://www.rohan-paul.com/p/frontier-ai-can-now-autonomously 🗞️ Frontier AI can now autonomously chain complex, expert-level cyber attacks end-to-end, 🗞️ Google DeepMind’s real-time video AI doctor is here. 🗞️ Anthropic launches ‘Claude Security’ public beta to detect and patch software vulnerabilities 🗞️ The White House has blocked Anthropic’s push to expand access to Mythos

译我的通讯今日刊已刚刚发出。 🔗 https://www.rohan-paul.com/p/frontier-ai-can-now-autonomously 🗞️ Frontier AI 现已能端到端自主串联复杂的专家级网络攻击， 🗞️ Google DeepMind 的实时视频AI医生已问世。 🗞️ Anthropic 推出“Claude Security”公开测试版，用于检测和修补软件漏洞 🗞️ 白宫已阻止 Anthropic 扩大对 Mythos 访问权限的推进

Rohan Paul@rohanpaul_ai · 5月1日61

Google DeepMind’s real-time video AI doctor is here. They just introduced AI co-clinician, a triadic care system built to work under a doctor’s supervision during patient care. The system is built to retrieve clinical-grade evidence, verify it, and in patient-facing simulations use a dual-agent setup where one module talks while another watches for boundary violations. It also beat other frontier models on open-ended drug questions, because real medicine arrives as messy patient cases, not multiple-choice exams. DeepMind evaluated it against the failure modes clinicians actually care about: saying the wrong thing, or failing to surface the crucial thing. In 98 realistic primary care evidence queries, physicians preferred the co-clinician to leading evidence-synthesis tools, and the system logged zero critical errors in 97 cases under their NOHARM-style evaluation.

译Google DeepMind 近日发布 AI co-clinician 协诊系统，这是一个多模态代理系统，旨在辅助医护人员，并在医生监督下运行。系统采用双代理架构：一个模块与患者对话，另一模块实时监控交互边界，能检索并验证临床级证据。在开放式药物问答中，其表现超越前沿模型，更贴合真实医疗场景的复杂性。评估聚焦临床实际关切，如避免错误陈述或遗漏关键信息。在98项初级保健模拟查询中，医生对其偏好超过主流证据合成工具；在97例NOHARM风格评估中未出现严重错误。

Google Gemini@GeminiApp · 5月1日31

See how @anyma_eva partnered with Gemini and @googledeepmind to dissolve the distance between imagining and creating. 🧵

译看看 @anyma_eva 如何与 Gemini 和 @googledeepmind 合作，消弭想象与创造之间的距离。🧵

Rohan Paul@rohanpaul_ai · 5月1日56

Time published a piece. Google’s AI position came from a long series of early bets by Sundar Pichai on DeepMind, TPUs, cloud infrastructure, and AI products, not from a last-minute reaction to ChatGPT. Google’s biggest strength in AI is its full-stack control of research, chips, cloud, products, and distribution across billions of users. "Critics once underestimated CEO Sundar Pichai. Now, critics wonder if he’s made Google too powerful" Google just secured absolute architectural control over the AI landscape by merging its custom physical silicon manufacturing directly with a single unified research laboratory. Competitors pay steep financial premiums for external chips while Google seamlessly executes complex neural calculations on its proprietary Tensor Processing Units. Building internal hardware allows engineers to aggressively scale pretraining, the critical phase where models ingest massive datasets, without facing crushing financial overhead. --- time .com/collection/time100-most-influential-companies/2026/saudi-aramco/

译《时代》杂志指出，谷歌在人工智能领域的领先地位，源于CEO桑达尔·皮查伊早期对DeepMind、TPU芯片、云基础设施及AI产品的一系列长期投资，而非对ChatGPT的仓促反应。其核心优势在于对研究、芯片、云服务、产品和覆盖数十亿用户的分发渠道实现全栈控制。通过将定制芯片制造与统一的研究实验室深度融合，谷歌获得了对AI架构的绝对控制权，能利用自研TPU高效执行复杂计算，同时让工程师得以低成本大规模扩展模型预训练，而无需像竞争对手那样承受高昂的外部芯片采购成本。