AIHOT
内容
精选全部 AI 动态AI 日报主题收藏
接入
Agent 接入
更多
关于更新日志反馈
内部员工登录
精选全部日报更多
内部员工登录
全部动态X · 2404 条
全部一手资讯X论文
标签「大佬观点」清除
Chubby♨️@kimmonismus · 6月15日38

Everyone's still arguing about which lab wins the model race. Satya Nadella made an interesting point: the smarter AI gets, the more valuable human judgment becomes. (Machines don't decide what's worth doing, you do.) "Without human direction, you have compute running in circles."

译所有人仍在争论哪个实验室赢得模型竞赛。 萨提亚·纳德拉提出了一个有趣的观点:AI 越智能,人类判断就越有价值。(机器不决定什么值得做,你决定。)“没有人类指引,计算就是在原地打转。”

François Chollet@fchollet · 6月15日44

Near-term AI isn't fundamentally different from past tech waves. It's the newest form of digital leverage. It's a force multiplier, and force without direction is just noise. It still requires a human in the loop at every level in order to be useful.

译近期AI与过去技术浪潮并无本质区别。它是最新形式的数字杠杆。它是力量倍增器,但无方向的力量只是噪音。它仍然需要在每个层级都有人的参与才能发挥作用。

elvis@omarsar0 · 6月15日51

I spent the last 6 months building my own harness and orchestrator. I built it to allow me to experiment on the frontier of ideas. Little did I know that the orchestration, the harness, routing capabilities, dynamic artifacts/workflows, verifiers, ability to switch/route between agent backends, automations, the skills, and the MCP tools would be the absolute best defense for what happened with Fable this week. The argument folks made when I was talking about "owning the agent orchestrator" at the beginning of the year is that this is just high maintenance, too costly, and is unsustainable. It might still feel like it to many. But there is too much to lose if you decide to lock yourself in with a specific tool or model provider. Really, the way I have built my orchestrator is through mining my agent sessions and using that to recursively build and test our new ideas that range from autonomous loops to continual learning/memory systems. I can test research ideas on the fly. I just can't go back to using a vendor that only offers me a set of features. My argument now is that you really don't have a choice. You need to be able to control cost, decision making, context management, and everything in between. If you don't, then how are you going to tap into the world of recursive self-improving AI? It won't get any easier if you don't own the decision-making part of the intelligence stack.

译Elvis Saravia(DAIR.AI)耗时6个月构建自有的 agent orchestrator(编排器),具备编排、路由、动态工件/工作流、验证器、agent 后端切换、自动化、技能及 MCP 工具等功能。这些能力在本周的 Fable 事件中成为最佳防御。他年初即主张“拥有自己的 agent orchestrator”,反对者认为维护成本高且不可持续,但他认为锁定特定工具或模型供应商损失更大。通过挖掘 agent 会话递归构建和测试新想法(包括自主循环、持续学习/记忆系统),他已无法回到仅提供固定功能的供应商。他强调必须控制成本、决策和上下文管理,否则无法进入递归自我改进 AI 领域。

Emad@EMostaque · 6月15日13

AI is heteroousios with Man, as Man is heteroousios with God

译AI与人是不同本质的,正如 人与神是不同本质的。

Nathan Lambert@natolambert · 6月15日42

Recent events are so heavy bc that this feels like a start of a new tumultuous era rather than a one & done policy calibration. It's clearer we need an open ecosystem, but powerful models are coming that could cause strong reactions (or bans) with no champion to defend them.

译近期事件如此沉重,让人觉得这更像是一个动荡新时代的开端,而非一次性的政策调整。 我们显然需要一个开放的生态系统,但强大的模型即将出现,可能引发强烈反应(乃至禁令),而无人为其辩护。

Nathan Lambert@natolambert · 6月15日42

Threading the needle in this post of anthropic has done some bad things for AI governance & the discourse but the actions of this administration are way worse so we need to get a handle on it before stronger models, open or closed, come along soon. https://www.interconnects.ai/p/welcome-to-the-agi-era-of-ai-governance

译串联本文的要点:Anthropic在AI治理和公共讨论方面做过一些坏事,但本届政府的行动糟糕得多,因此我们必须在更强大的模型(无论是开源还是闭源)很快出现之前控制住局面。 https://www.interconnects.ai/p/welcome-to-the-agi-era-of-ai-governance

Nathan Lambert@natolambert · 6月15日41

The only reasonable expectation if you're a fan of open weight models is that if there's a major step in chinese open-weight performance, there's a good chance the whole chinese llm sphere is banned. National security apparatus will happily give a big "fuck you" to open models.

译AI研究员Nathan Lambert指出,开源权重模型支持者需清醒认识:一旦中国开源LLM性能出现重大突破,整个中国大语言模型领域很可能面临全面禁止。国家安全机构会毫不留情地打压开源模型。引用其博客进一步强调,尽管Anthropic在AI治理上确有不当,但当前美国政府的行动更为恶劣,必须在更强模型(无论开源或闭源)到来前控制局面。

Nathan Lambert@natolambert · 6月15日56

What comes next with AI governance with stronger models. I’m particularly concerned with the open-source community who is celebrating recent events, as they’re entirely unprepared for when serious policy actions come their way (and I expect it soon). https://www.interconnects.ai/p/welcome-to-the-agi-era-of-ai-governance

译随着更强模型的出现,AI治理的未来走向如何。我特别担心那些正在庆祝近期事件的开源社区,因为他们完全没准备好应对即将到来的严肃政策行动(而且我预计很快会来)。

elvis@omarsar0 · 6月15日35

Highly recommended reading. Don't offload your learning. Don't offload your creative process. "You can offload a task, or even a job, but you can never offload your learning."

译强烈推荐阅读。 不要外包你的学习。不要外包你的创意过程。 “你可以外包一项任务,甚至一份工作,但你绝不能外包你的学习。”

AYi@AYi_AInotes · 6月15日41

保罗·格雷厄姆刚发了一篇文章,如何赚十亿美金 想赚十亿别盯着钱,保罗·格雷厄姆说,盯着这两个数字就行。 他做了二十一年创业孵化, 见过三十位创始人成为亿万富翁, 结论很简单,靠指数增长就行,根本用不着作弊。 核心就两个数字,月增长率,和增长能持续多久。 每月涨15%听着不起眼,五年能翻四千三百八十四倍。 月入一万的生意,五年后月入四千四百万,创始人自然身家十亿。 这不是什么神话故事,就是最朴素的复利数学原理。 而高增长的源头,从来不是剥削用户, 关键是要做出好到用户会主动拉着朋友用的产品。 最好的创业点子也从来不是刻意找出来的, 是你和朋友觉得酷、想做来自己用的东西,苹果谷歌脸书Airbnb起步全是如此。 最后PG补了句扎心的大实话, Claude永远做不到这件事,因为它既没有朋友,也对任何东西没有欲望🤣

译保罗·格雷厄姆发表文章《如何赚十亿美金》,基于21年创业孵化经验(见证30位亿万富翁),指出核心在于月增长率与持续时间——月增15%保持5年可翻4384倍,月入1万美元的生意5年后月入4400万美元,创始人自然身家十亿。高增长源于做出好到用户主动推荐的产品,最佳创业点子来自自己做且觉得酷的东西。PG最后调侃Claude做不到,因为它没朋友和欲望。

Berryxia.AI@berryxia · 6月15日50

Siri AI 并非 Google Gemini。 大家都在说:iOS 27 只是在 Gemini 的基础上添加了一些苹果自家的功能罢了……但这种说法完全错误! 实际上,Siri AI 是由苹果公司自主研发的;它并非基于 Google Gemini 构建的。 苹果并没有直接复制 Gemini 的代码或功能,而是从 Gemini 获得了相关技术许可,将其作为“训练模型”来开发自己专有的 AI 模型(即 Apple Foundation Models, AFM)。 Siri AI 的核心模型及其底层架构完全由苹果自己设计并实现。 因此,Siri AI 属于苹果公司的自有产品,而非 Google Gemini 的衍生品。

译推文澄清了Siri AI并非在Google Gemini基础上简单封装。苹果并未直接复制Gemini代码,而是从Gemini获得许可,将其作为“教师模型”来训练自己的专有AI模型Apple Foundation Models (AFM)。Siri AI的核心模型和底层架构完全由苹果自主设计与实现,因此是苹果自有的AI产品,而非Gemini的衍生品。

Ethan Mollick@emollick · 6月15日15

Two days later and the situation is still confusing.

译两天过去了,情况仍然令人困惑。

Satya Nadella@satyanadella · 6月14日65

http://x.com/i/article/2065582894790365184 # A frontier without an ecosystem is not stable I’ve been thinking a lot about the future of the firm in an AI-driven economy. This transition is different than any previous platform shift. In the past, we used digital systems to enhance human capital. This is the first time we can create a real cognitive loop between people and digital systems. That is a mind-bender, because it changes how we even conceptualize work inside an enterprise. What is at stake is not some digital tool or system and its use, but how organizations continue to learn, build IP, differentiate, and thrive in a world where AI models can continuously absorb the expertise of humans and organizations and commoditize it. Every company is going to have to build what I think of as human capital and token capital. Human capital comprises the knowledge, judgment, relationships, ingenuity, and pattern recognition of its people, while token capital is the firm’s AI capability it builds and owns. Importantly, human capital does not become less valuable as token capital grows. It only becomes more valuable! I believe human agency will be the driver of token capital growth. Humans will set ambitious goals, connect dots across domains, build relationships, and recognize patterns that matter most. Without human direction, you have compute running in circles. This means the real opportunity is not in picking the best model but instead in building a learning loop on top of models where human capital and token capital compound. You can offload a task, or even a job, but you can never offload your learning. The future of the firm is the ability to compound that learning across people and AI. This requires a new architectural approach where every business is able to build agentic systems that improve over time, while still retaining control over their IP. A company should be able to switch out a “generalist” model without losing the “company veteran” expertise built into their learning system. This is the key “test” of your control and sovereignty in the era ahead. Companies need to turn their workflows, domain knowledge, and accumulated judgment into AI systems that improve with each use. Private evals should capture whether a model is actually improving against outcomes that matter to the business (not just external benchmarks!). Private reinforcement learning environments should let models grow stronger on real traces from inside the organization. Its knowledge base makes institutional memory queryable and use of tokens more efficient. This loop becomes the new IP of the firm. I think of it as a hill climbing machine. And unlike most assets, it compounds. Every improved workflow generates better training signal, which accelerates the accumulation of tacit knowledge unique to the firm. The companies that build this early will have an advantage that is hard to replicate, regardless of any new individual model capability. The last thing any of us want is a world where every company across every sector is ceding value to a few models that eat everything they see. If all the value is accrued by only a few models, the political economy will simply not tolerate it. There is no societal permission for an AI future that hollows out entire industries. Think about what happened in the first phase of globalization where entire industrial economies were hollowed out by outsourcing. The GDP numbers looked fine on the surface, but the displacement was real and the consequences are still being felt. Let us not bring that dynamic into the AI era, with a small number of AI systems capturing all the economic returns, while entire industries find their knowledge commoditized right out from underneath them. In my view, our priority has to be building a frontier ecosystem, not just a frontier model, so value flows broadly across every company, every industry, and every country. One where every organization can own the learning loop that encodes its institutional knowledge, compounding its human and token capital. This is the ethos I’ve grown up with where platforms enable more value on top than is captured inside, and where every company can continuously innovate and build value of its own. When that happens, companies will create value for themselves and for the economy around them. Employees will see their expertise amplified and their judgment become part of systems that make it replicable and scalable and the benefits accrue to the companies and communities around them. That is how companies drive value for themselves and the broader economy. And it is the stable equilibrium we should build together.

译微软CEO Satya Nadella认为,AI驱动的平台转变首次实现人与数字系统间的认知循环。企业需同时构建人力资本(知识、判断、关系)与token资本(自有的AI能力),且人力资本不会贬值,反而随token资本增长而增值。真正的机会在于建立人力资本与token资本复合增长的学习循环——企业应能替换通用模型而不丢失已内化的专家知识,通过私有评估和强化学习让模型从内部真实轨迹中持续提升。他警告,若所有价值被少数模型吞噬,将重演全球化空心化悲剧,呼吁构建前沿生态系统,让每家企业、行业和国家拥有自己的学习循环。

Chubby♨️@kimmonismus · 6月14日24

Tomorrow will be an exciting day. -Will Fable-5 be released again in a modified form? -How will the market react to the US regulation? -What is the situation regarding Anthropic's valuation? I don't think I've often been as excited as I am for tomorrow. History is written and 99% of people dont even understand.

译明天将是激动人心的一天。 -Fable-5会以修改形式再次发布吗? -市场会如何应对美国监管? -Anthropic的估值情况如何? 我觉得我很少像对明天这样兴奋。 历史正在被书写,而99%的人根本不理解。

Emad@EMostaque · 6月14日13

Not your models Not your mind

译不是你的模型 不是你的思维

Rohan Paul@rohanpaul_ai · 6月14日50

Social skills are becoming more important for job outcomes and pay. As AI handles more tasks, roles that rely on human interaction are seeing better returns. The economy is increasingly rewarding people with broad abilities—those who work well in teams, solve problems, communicate clearly, and think creatively. Chart from FT ft .com/content/5e2593a3-e834-4822-bbc8-7cb27086af24

译社交技能对就业结果和薪资正变得越来越重要。随着AI处理更多任务,依赖人际互动的角色正获得更高回报。 经济正日益奖励那些具有广泛能力的人——善于团队合作、解决问题、清晰沟通和创造性思考的人。 图表来自《金融时报》 ft .com/content/5e2593a3-e834-4822-bbc8-7cb27086af24

Rohan Paul@rohanpaul_ai · 6月14日51

Blackstone President and COO Jon Gray made a very good point. Any rule-based businesses, like accounting, legal, finance, will be completely disrupted by AI. 🎯 e.g. JPMorgan dropped proxy advisors for shareholder votes, replacing them with AI.

译Blackstone总裁Jon Gray指出,任何基于规则的业务(如会计、法律、金融)都将被AI彻底颠覆,例如JPMorgan已用AI取代代理顾问处理股东投票。引用Vinod Khosla对印度的警告:传统IT服务和BPO业务“将消失”,但若转向AI部署仍可获胜。

gabriel@gabriel1 · 6月14日44

consumers pay 20$/month and don't care about frontier performance enterprise pay $40T/year for intelligence (knowledge work), and really care about frontier performance focusing on consumer is a mistake and anti-agi pilled

译消费者每月支付20美元,不在乎前沿性能。 企业每年支付40万亿美元用于智能(知识工作),并且非常在乎前沿性能。 专注于消费者是个错误,且是反AGI的。

Rohan Paul@rohanpaul_ai · 6月14日47

"Learning to program was so obviously the right thing in the recent past. Now it is not." ~ Sam Altman on skill to survive the AI era.

译"学习编程在不久前显然还是正确的事情。但现在不是了。" ~ Sam Altman 谈在AI时代生存的技能

Rohan Paul@rohanpaul_ai · 6月14日56

Vinod Khosla on why he does not really prefer "AI co-pilots". Because he thinks "humans get in the way of co-pilots", which slows everything down and blocks real change. He says workers like accountants and programmers do not actually want co-pilots, because they feel their jobs are at risk and then resist using the tool properly. So instead of “helping” them, he prefers building AI that fully does the job itself, like a complete software engineer. He expects that by 2030, most of these roles will be pure AI workers, not human+co-pilot. --- From 'Corgi Insurance' YT channel (link in comment)

译Vinod Khosla 不看好“AI 副驾驶”模式。他认为人类会妨碍 AI 副驾驶的发挥,导致效率降低并阻碍真正变革。会计师、程序员等员工因担心失业而抵触工具,不会正确使用。因此,他更倾向于构建能独立完成整个岗位工作的 AI,例如完全替代软件工程师的 AI。他预计到 2030 年,大多数此类岗位将由纯 AI 工人而非“人类+副驾驶”承担。

宝玉@dotey · 6月14日33

当年 GPT 3.5 的时候,很多人在提示词里面让它把自己当成 GPT-4,号称性能就更好,你现在信吗?

译真正模型实力来自底层权重与训练数据,而非复制 prompt。泄露 prompt 只能让老模型 cosplay 出 lite 版,性能差距大。Fable 5 在长时程复杂分析等任务上领先,业界期待更多训练创新和 benchmark。

Yuchen Jin@Yuchenj_UW · 6月14日48

One hypothesis: If non-citizens at Anthropic can’t work on Mythos/Fable, and LLM jailbreaks remain unsolved, US frontier labs will be forced to slow down training and model releases. Could Chinese open-source AI surpass US closed models for the first time in ~6 months?

译一个假设: 如果Anthropic的非公民不能参与Mythos/Fable项目,且LLM越狱问题仍未解决,美国前沿实验室将被迫放缓训练和模型发布。 中国开源AI是否会在约6个月内首次超越美国闭源模型?

小互@xiaohu · 6月14日75

Anthropic 上市前夕 彭博社采访了Anthropic 公司俩兄妹,在这次采访中(Fable 5 还没有被封杀)Dario Amodei极度的渲染了Mythos的威力和AI的威胁 当然这也是他一贯的主张,呼吁政府对AI监管,当然他呼吁的是对所有公司监管... 下面是一些采访片段剪辑(完全由Claude Code 翻译并剪辑) • 一个强到自己都不敢发布的模型 Mythos:上千个漏洞,能黑银行、撬国家机密,连 NSA 都抢着要用 • Dario 预言:AI 可能一到五年内,砍掉一半入门级白领工作 • Claude 被美军用进了对伊朗的战争,一所女校 150 人死亡的拷问 • 他头一次说清为什么离开 OpenAI:不是安全分歧,是信任崩了 • 当面回怼黄仁勋的"末日营销":把这说成廉价营销,本身才是廉价营销 • 文明崩溃概率 10% 到 25%,他拿"飞机会不会坠毁"给你算账

译Anthropic CEO Dario Amodei透露内部模型Mythos有上千漏洞,能黑银行、窃取国家机密;预言AI一到五年内砍掉一半入门级白领工作;称Claude已被美军用于对伊朗战争,涉及女校150人死亡拷问;解释离开OpenAI因信任崩塌;回怼黄仁勋末日营销指控;给出文明崩溃概率10%-25%。

宝玉@dotey · 6月14日46

模型是根本,Harness层相对好补齐,但Harness这层不需要太多垂直领域的,Claude Design 很快就会合并到 Claude Desktop,Codex 在下一代或者几代模型能力够了后,会在 Codex App 直接以 Plugin 集成 Codex Design

译模型能力是根本,Harness层相对容易补齐且无需过多垂直领域。Claude Design将很快合并至Claude Desktop。未来模型能力足够时,Codex会在Codex App以Plugin集成Codex Design。针对开源Open Design方案,若使用Claude Code的模型能否达到类似工程能力?这是该讨论中提出的问题。

宝玉@dotey · 6月14日49

精细调整字型字号颜色,确实是设计师的日常。但我觉得用 AI Agent 辅助设计之后,修改的方式也得跟着变: 1、设计系统要用起来 为什么需要手动精调字型字号、颜色?很多时候是因为没有统一的设计系统做规范。如果有配套的设计系统,按钮圆角、字号、间距都有严格定义,生成时不会出现 3px、5px 这种随意值。就算偶尔有偏差,让 Agent 遵循设计系统去修改就行,极少需要人工微调。 2、设计师变成设计经理 不再亲自调像素,而是用文字指令指挥 Agent 去改。Opus 4.8+ 结合设计系统,基本做到"言出法随",不太会偏出你的要求。 3、方向和验收还是人的活 虽然执行交给了 Agent,但大方向还是人来把关,告诉 Agent 该怎么调整,调完检查结果是否符合预期。Agent 干活,人做判断。

译建立统一设计系统,Agent 遵循规范;设计师不再调像素,用文字指令指挥 Agent;方向与验收仍由人把关。引用指出并非所有情况都适合用 Claude Design 描述精确调整。

宝玉@dotey · 6月14日63

给 Agent 交代任务的时候一定说清楚怎么验证,然后就怎么需要管中间结果了

译宝玉分享与AI Agent交互的关键习惯:交代任务时只需说清楚验证标准,之后便无需关注中间结果。引用@huangyun_122的做法:先让Agent写出代码计划,反复确认后汇总为任务列表,最后编程并逐项标记完成。这一流程确保目标明确,同时减少不必要的中间干预,提升效率。

Rohan Paul@rohanpaul_ai · 6月14日62

Vinod Khosla’s warning for India's BPO in the age AI: The traditional IT services and BPO business “will be gone” But India can still win if it shifts to deploying AI. ---- From "SparX by Mukesh Bansal" YouTube channel, (link in comment)

译Vinod Khosla称传统IT服务和BPO业务“将消失”,但印度若转向部署AI仍能胜出。TCS主席表示AI智能体数量未来或与员工数相当,公司已裁员1.2万人,AI年化收入达23亿美元,并与OpenAI签有数据中心协议。印度3150亿美元IT服务业依赖低成本人力,AI可在欧美云端运行、遵循本地规则,使区位优势失效。TCS预计招聘下降,旧有外包模式或崩溃,转向软件自动化。

elvis@omarsar0 · 6月14日47

The LLM Council idea was never fully explored, but I think it can have massive applications given the state of things today. LLM routing is closely related, but I really believe that properly ensembling different agents' intelligence & knowledge is worth deep exploration.

译LLM Council 的想法从未被充分探索,但我认为鉴于当今的状况,它可能有巨大的应用。LLM 路由与之密切相关,但我真的相信,适当地集成不同智能体的智能和知识是值得深入探索的。

Logan Kilpatrick@OfficialLoganK · 6月14日29

Our long term goal for @GoogleAIStudio is to eliminate the friction to build with AI, then do the same for your own business, and ultimately unlock economic opportunity for everyone. Feels like early innings but I keep getting more excited about this.

译我们为 @GoogleAIStudio 设定的长期目标是消除使用 AI 构建的摩擦,然后为你的企业做同样的事,并最终为每个人释放经济机会。 感觉还只是早期阶段,但我对此越来越兴奋。

elvis@omarsar0 · 6月14日71

http://x.com/i/article/2065876120965111808 # Autonomous Long-Running Coding Agents Autonomous coding is moving from better prompting to better control systems. The important shift is that engineers are learning how to wrap agents in goals, evaluators, loops, and artifacts that let them keep working after the human stops typing. This matters because most serious engineering work spans long horizons: ambiguous requirements, hidden constraints, partial failures, changing context, and repeated verification. The new frontier is designing the system around the agent so it can plan, execute, check its work, recover from mistakes, and keep making progress without constant human steering. This piece is based on a DAIR.AI Academy session on autonomous long-running coding agents, where I walked through Claude Code's /goal mode, the newer /loop command, verifiers, artifacts, and orchestration patterns in practice. Written in collaboration with Codex and Claude Code. ## From Prompting to Goal Design The core idea behind features like Claude Code's /goal is simple. A coding agent remains the executor, but the human no longer interacts with it turn by turn. Instead, the human specifies the desired end state, the evidence required to prove success, the constraints that must not be violated, and, where possible, the number of turns and budget. That goal works more like a contract than a longer prompt. A weak goal gives the model room to stop early, take shortcuts, or redefine success in a way that looks plausible in the transcript but fails in the real system. A strong goal gives the agent a target it can repeatedly measure itself against. Engineering judgment still matters here. The best goals encode domain knowledge that the model would otherwise guess. For a research experiment, that might mean a target benchmark score, a held-out evaluation, a required loss curve, and a rule that the result must beat an initial baseline. For a UI task, it might mean a screenshot reference, concrete layout constraints, and a browser verification step. The model can execute, but the human still defines what "done" actually means. ## The Evaluator Becomes a First-Class Component Long-running agents need a second role besides the goal. That evaluator can be another coding agent, an LLM-as-judge, a script, a test suite, a benchmark harness, or a mix of all of them. The key design choice is matching the evaluator to the task. When success is crisp, deterministic checks are better. Type checks, unit tests, lint rules, integration tests, and benchmark scripts should be used whenever they can express the condition clearly. When success is fuzzy, an agent evaluator becomes useful. A script can tell you whether tests pass, but it cannot easily decide whether a generated research report is coherent, whether an implementation faithfully follows a paper, or whether a UI matches a design intent. This is where the evaluator benefits from language, judgment, and sometimes vision. The practical pattern uses deterministic checks as the floor and agent evaluation as the higher-level review. That combination reduces hallucinated success while still allowing autonomy on tasks that do not fit cleanly into a test assertion. ## Verifiers Define the Boundary of Trust The deeper point is that autonomy only works when the system has a reliable verifier. A coding agent can generate a plan, implement a feature, and explain why it believes the work is complete, but that explanation should not be treated as evidence. Evidence comes from an external check that the agent cannot easily talk its way around. For code, the verifier might be a test suite, type checker, benchmark, browser run, screenshot comparison, or reproducible script. For research work, it might be a held-out evaluation, a reproduced table, a loss curve, or a benchmark score that improves over the baseline. For design work, it might be a reference screenshot plus a visual review step. The verifier is what turns a long-running agent from a confident text generator into a system that can be trusted with more time. Most shortcuts appear at this boundary. If the verifier is vague, the model will often satisfy the easiest interpretation of the task. If the verifier is too narrow, the model may overfit to it and miss the broader intent. A good autonomous workflow, therefore, needs layered verification, with cheap deterministic checks catching basic failures and higher-level review catching judgment-heavy failures. A few of the frontier models can already achieve some level of verification, but based on my research, there is still an evident OOD problem, where if the verification task you assign to the agent falls outside the training distribution, models struggle significantly. Verifiers are still an open area of research, but I anticipate more companies will start to make huge investments in this area. The concept of fine-tuned verifiers is also in high demand in the enterprise. ## Loops Make Autonomy Durable A goal gives the agent direction, but a loop keeps the work alive. This distinction is important because models often stop before the real task is finished. They may hit a turn limit, lose confidence, exhaust context, or decide that a partial solution is enough. The loop is the outer control system. It wakes up, inspects progress, runs checks, compares the result against the goal, and sends the agent back in with the next instruction when the goal has not been met. In its simplest form, this is the Ralph loop pattern with a coding agent and a deterministic condition. In a more flexible form, the loop includes an evaluator agent that can reason about progress and decide what should happen next. Long-running autonomy works as repeated effort under supervision from a control layer, not as one continuous act of intelligence. The agent can still fail, but the loop gives the system a way to notice the failure and continue instead of silently declaring victory. ## Planning Is Where Expertise Enters One of the strongest themes from the session was that planning remains critical. You can ask a frontier model to generate a plan, but you still need to inspect it, challenge assumptions, and make the success criteria sharper before handing the task to an autonomous loop. This leads to a useful division of labor. A stronger planning model can help define the goal, identify missing constraints, and structure the evaluation. A different execution model can then run the implementation once the plan is clear. In practice, this means engineers should stop thinking of "the model" as a single choice. Model choice becomes an architecture decision. Some models are better planners. Some are better executors. Some are cheaper evaluators. Some are better at vision-based review. A good orchestrator lets you swap these roles instead of waiting for one vendor to provide the perfect coding agent interface. ## Visual Artifacts Become Control Surfaces Terminal transcripts do not scale when many agents are running. Once you have several sessions working in parallel, raw text becomes a poor interface for understanding progress. Live artifacts matter because a dashboard with loss curves, benchmark scores, task states, screenshots, cost estimates, and recent decisions gives the human a much better way to supervise autonomy. The artifact becomes the control surface for deciding when to intervene, rather than a report generated after the fact. The most useful pattern is to separate storage from presentation. Markdown or a vault can store durable evidence, logs, notes, plans, and results. HTML artifacts can render that state into something visual and interactive. The agent can search the Markdown, while the human can monitor the artifact. For UI and product work, visual cues are especially powerful. A screenshot reference can communicate design intent more precisely than prose, and a vision-capable evaluator can compare the implementation against that reference. This reduces the common failure mode where the agent technically implements the requested component but misses spacing, hierarchy, alignment, or product feel. ## Session Mining Turns Usage Into Memory Another important insight is that past agent sessions are a rich source of workflow data. If an agent repeatedly fails in the same way, forgets to run the same check, uses the wrong path, or retries the same broken command, that pattern should not stay buried in logs. Session mining turns those transcripts into operating rules. An agent can scan the last thirty days of work, find recurring failure modes, and propose updates to project instructions, vault learnings, or agent rules. This is how a team can gradually improve its harness without manually remembering every mistake. The goal is to make the local environment smarter without training a model from scratch. A small rule in an agent instruction file can prevent repeated failures across future sessions, especially when the rule is specific to the project. ## A Practical Operating Model For AI engineers, the emerging workflow looks like this. - Start with a small, cheap subset before launching the full autonomous run. - Write a goal with measurable success criteria, explicit constraints, and a turn or time budget (where possible). - Separate the executor from the evaluator so implementation and judgment are not collapsed into one role. - Define external verifiers before the long-running loop starts. - Use deterministic checks wherever possible, then add agent review for fuzzy criteria. - Require proof artifacts such as logs, screenshots, benchmark curves, or changed files. - Mine past sessions and promote repeated lessons into project instructions. That is the difference between using a coding agent and engineering an autonomous coding system. One gives you a conversation. The other gives you a harness. ## What Still Breaks None of this removes the hard problems. Agents still take shortcuts. They still stop early. They still overestimate completion. They still produce confident but weak plans, especially on recent papers, unfamiliar benchmarks, or systems outside their training distribution. Trusting them more will not solve this. Better control systems will. Goals, loops, evaluators, deterministic checks, visual artifacts, and session memory are all ways of making autonomy observable and correctable. The direction is clear. The future of coding agents depends on better orchestration around more capable models, where engineers design the conditions under which agents can safely run for hours or days and still produce work that can be verified.

译长期运行编码智能体核心从提示转向控制系统。Elvis Saravia在DAIR.AI Academy session中详解Claude Code的/goal模式:人类指定最终状态、成功证据、约束与预算,目标作为“合同”而非长提示。评估器成为第一类组件——明确任务用确定性检查(测试、lint、基准),模糊任务用智能体评估器(判断报告、UI设计),两者结合降低幻觉。验证器定义信任边界:外部检查(测试套件、类型检查、浏览器运行、截图对比)提供不可绕过的证据。

Nathan Lambert@natolambert · 6月14日46

The Dario faction and the Sacks faction speak very different languages, and a Dario clarification could sound like a refusal. This puts us very squarely in vibe governance. Models are released when the gov thinks its okay, and it is unlikely this is based on technical evals.

译美国政府要求Anthropic的Dario修复模型越狱漏洞或下架模型,Dario拒绝。Anthropic博客声称越狱不严重。Nathan Lambert评论称Dario派系与Sacks派系立场迥异,Dario的澄清实际构成拒绝,使行业陷入“氛围治理”——模型发布由政治判断而非技术评估决定。

Nathan Lambert@natolambert · 6月14日45

Transparency into every power player at the frontier of AI (labs, government, etc) is the only viable solution. Figuring out the right transparency is hard, but it can't be he said she said between dario and the white house that determines the fate of the AI ecosystem.

译对AI前沿的每一个权力参与者(实验室、政府等)保持透明是唯一可行的解决方案。 找到正确的透明度很难,但不能由dario和白宫之间的互相指责来决定AI生态系统的命运。

宝玉@dotey · 6月14日51

为啥 Codex 还不推出类似 Codex Design 的产品? Anthropic 最近推出了 Claude Design,是我除了编程之外用得最多的 Agent,也推荐过很多次。效果真的好:你用一句话描述想要的 App,它直接给你生成一个可交互的原型,点哪哪都有反应,不仔细看还以为在操作真实的 App。 有网友问:为啥 Codex 还不推出类似 Codex Design 的产品? 简单来说,GPT-5.5 的模型能力还做不好这件事。但要解释清楚为什么,得先理解一个关键区分。 【1】Agent 的两层:模型和 Harness 很多人把 Codex、Claude Design 和 GPT-5.5、Claude Opus 4.8 混在一起说,其实它们是完全不同的两层。 Claude Design 和 Codex 是"产品层",业界叫 Harness,包括提示词、工具链、UI 交互流程这些工程层面的东西。Claude Opus 4.8 和 GPT-5.5 是"模型层",是真正干活的大脑。 打个比方:Harness 是厨房,里面有锅碗瓢盆(工具)和菜谱(Skills),模型是厨师。同一套厨房,换个厨师,做出来的菜完全不一样。 理解了这个区分,后面的事情就好说了。 【2】Harness 不是门槛 Claude Design 的 Harness 层技术上不复杂。花点心思逆向一下,提示词、工具代码几乎都可以拿到。我已经做过了,成果在 baoyu-design(https://github.com/JimLiu/baoyu-design),可以借助 Skill 把 Claude Design 在其他模型上运行。工程上没秘密。 真正拉开差距的是背后的模型。 【3】高精度可交互原型,难在模型 Claude Design 这个名字容易让人误解,以为交付的是 Figma、Photoshop 那样的静态设计图。实际上它交付的比 Figma 更进一步,是融合了设计稿和原型的高精度可交互原型:你不光能看到设计,还能直接上手操作。 这对模型的要求很高。 举个例子。我要做一个类似 X/微博的客户端。让模型画一个好看的静态界面,很多模型都做得到。但要让这个界面能交互就复杂了:切换不同 Timeline,展示不同类型的推文(文本、图片、视频),点赞要变红心,删推要从列表消失,从列表点进详情再返回,状态还要保持住。 要做到这些,模型必须在动手画 UI 之前,先把整套数据结构和状态管理想清楚:tweet 长什么样、timeline 有哪几种、每个按钮当前是什么状态、状态之间怎么联动。这是系统架构设计的活,不是画 UI 的活。 Claude Design 对模型的要求,是同时具备优秀的 UI/UX 设计能力和系统架构设计能力,缺一个效果就大打折扣。这也是为什么我之前反对只产出纯 HTML 的设计稿,那只是静态的 UI 设计,没有融合 UX 交互。 有条件的话可以自己测试感受一下。比如用这个提示词: Design a X Client for Mac, similar to Tweetbot for Mac from Tapbots 同样的提示词让 Codex 去做,也能出个东西,能看,也能简单交互。但对比一下就知道差距了:列表能滚动,sidebar 不能点;点赞按钮没反应。来回迭代好几轮,才能达到一个勉强凑合的水平。 Claude Design 做出来完全不一样。从 Timeline 切到通知页,从列表点进详情再返回,全程流畅,状态都保持住了。不仔细看真以为在操作一个完成度很高的 App,虽然数据都是模拟的。 Claude Opus 4.8 显然在设计和架构这类场景上做了大量训练和优化。 【4】产出物就是代码 去看 Claude Design 的产出物,注意里面的 data.jsx 文件。它把整个设计的数据结构定义得很清晰,基于这个结构模拟了一套完整数据,然后用 React 在这套数据上构建 UI。 设计产物本身就是代码(React、CSS、JSON),不是 Figma 或 PSD,任何开发者拿到都能直接看出按钮的圆角、主色、间距,照着自己的技术栈实现就行。后续设计变更?git diff 一看就知道改了什么。设计和开发之间的沟通损耗降到了最低。 说得不严谨,应该说设计 Agent 和开发 Agent 之间的沟通损耗很低了。现在都是人在指挥 Agent 去设计,人指挥 Agent 写代码了。 【5】怎么用好 Claude Design 很多人不知道该怎么用好 Claude Design,其实有点像 Vibe Coding:有个基本的想法,先让它做一个版本出来,然后通过 Chat 去指挥 Agent 帮你改,调整几个版本你的思路就清晰了。 整个调整的过程非常神奇,有一种"言出法随"的感觉,你想让它怎么改它总能给你实现出来。这也是为啥我现在很痴迷用 Claude Design,反馈来得太快太过瘾了。 还有一个小技巧:不要说太具体的要求,而是说你的目标是想要什么,让它自由发挥。往往能得到更好的效果,毕竟它训练过几乎所有公共的 UI 设计。 回到最初的问题。Codex 不推类似的设计产品,是因为 GPT-5.5 还扛不住这个活。画个好看的界面很多模型都行,难的是在动手之前把数据结构、状态管理、交互逻辑都想清楚,然后一次性交付一个完整的可交互原型。 目前只有 Claude 的模型做到了。至于能领先多久,就看 OpenAI 或者其他家后面模型的进化速度了。

译Anthropic推出Claude Design,可用一句话生成高精度可交互原型。网友问为何OpenAI的Codex没有类似产品?关键在模型层差距。Agent分Harness(产品层)和模型层,Harness非门槛(已有开源baoyu-design可复现),真正壁垒是Claude Opus 4.8同时具备UI/UX设计和系统架构设计能力,先定义数据结构、状态管理和交互逻辑再交付完整原型。而GPT-5.5生成的交互效果差。产出物为React/CSS/JSON代码。

elvis@omarsar0 · 6月14日44

Even more data to support what I have been talking about. The combination of model intelligence (and this includes human expertise) has a compounding effect unlike anything I've seen. There are too many assumptions that a large general-purpose model will be a one-size-fits-all. I don't buy it. The reality, and the research supports this, is that these different models show different strengths and capabilities. Understanding how to tap into them in combination is a huge unlock. All engineering teams need to be thinking about this more carefully as a strategy going forward. Especially now, given the trends from frontier models in terms of selective access.

译OpenRouter 发布 Fusion API,号称“市场上最智能的复合模型”,能以一半的价格达到 Fable 级别的智能。主推文作者 Elvis Saravia 借此观点指出,模型智能与人类专业知识的组合具有惊人的复合效应,不同模型各有独特优势,而非通用大模型能一统天下。工程团队应将“组合调用不同模型”作为战略方向,尤其在前沿模型选择性开放的趋势下,理解如何协同利用它们将是巨大的解锁。

gabriel@gabriel1 · 6月14日19

agi is the most economically valuable asset of all time, there will be trillions in free market capital put into it this is extremely unlike the manhattan project. this time, governments can only cooperate. we can't just pick a winner, or that winner will lose

译AGI 是有史以来最具经济价值的资产,将会有数万亿美元的自由市场资本投入其中。这与曼哈顿计划截然不同。这次,政府只能合作。我们不能单挑出一个赢家,否则那个赢家会输。

Ethan Mollick@emollick · 6月14日48

I think the assumption that you should use smaller models for less important tasks is flawed (or at least deserves much more careful consideration). Big models are generally better at everything but cost, so it is worth considering whether gains in non-key tasks would be valuable

译我认为你应该对不太重要的任务使用较小模型的假设是有缺陷的(或者至少值得更仔细地考虑)。大模型通常在所有方面都更好,除了成本,因此值得考虑在非关键任务上的收益是否有价值。

Chubby♨️@kimmonismus · 6月14日45

Having access to different AI tools isn't the bottleneck anymore, it is the cognitive load of orchestrating them. LobeHub is tackling this systemic challenge with a new operational paradigm called the Chief Agent Operator (CAO). Instead of requiring users to micromanage individual tasks, the CAO serves as an autonomous management layer handling cross-tool orchestration behind the scenes.

译拥有不同AI工具不再是瓶颈,协调它们的认知负担才是。LobeHub正用一种名为"首席智能体操作员(CAO)"的新操作范式应对这一系统性挑战。 CAO不再要求用户微观管理单个任务,而是作为一个自主管理层,在后台处理跨工具编排。

Ethan Mollick@emollick · 6月14日56

Has there been anything good written about the failure of Mistral to keep up with both the Big Three and Chinese labs? They have talent and national backing, but despite being Europe’s only frontier lab (Google Deepmind’s UK lab aside), they haven’t been able to close the gap

译有没有关于Mistral未能跟上三大巨头和中国实验室的好文章?他们有人才和国家支持,但尽管是欧洲唯一的前沿实验室(谷歌DeepMind的英国实验室除外),他们一直没有能够缩小差距。

宝玉@dotey · 6月14日26

小孩子才做选择,成年人全都要

译tinyfool 问:现在你选 Claude Code 还是 Codex? 宝玉回应:小孩子才做选择,成年人全都要。

Chubby♨️@kimmonismus · 6月13日56

The next big beneficiary is, of course, OpenAI for two reasons. 1) IPO: OpenAI is concerned that Anthropic would preempt its IPO, resulting in a better valuation. And this was recently a likely scenario. This would have created the image of a second-rate competitor. Now, the question is, how will the ban affect Anthropic's valuation and its upcoming IPO? Who wants to invest in a company that has become persona non grata with US authorities (due to supply chain risk) and may not even be allowed to distribute its best models to enterprises, let alone globally? This will certainly put a significant downward pressure on the valuation. 2) OpenAI has the opportunity to learn from this, to proactively engage in discussions with US authorities to avoid such a disaster in advance, to determine how its model needs to be structured, to obtain the authorities' approval beforehand, and thus essentially use the time to develop a model and secure the necessary authorization to distribute it. OpenAI can learn from this situation and presumably has a better relationship with the US government than Anthropic. Therefore, it was a comparatively successful day for OpenAI. Its biggest competitor suffered a major setback.

译Anthropic最大投资者Amazon据称破解Claude并向美国政府告密,导致Anthropic被美国当局视为供应链风险,可能失去企业分发许可,其估值和IPO面临下行压力。OpenAI成为主要受益者:一方面消除了Anthropic抢先IPO的威胁,另一方面有机会主动与美国当局沟通,提前获得模型审批,从而在竞速中占据优势。

全部 AI 动态
AI 相关资讯全量信息流
全部一手信源资讯推文
全部模型产品行业论文技巧
6月15日
04:04
Chubby♨️@kimmonismus
38
所有人仍在争论哪个实验室赢得模型竞赛。 萨提亚·纳德拉提出了一个有趣的观点:AI 越智能,人类判断就越有价值。(机器不决定什么值得做,你决定。)"没有人类指引,计算就是在原地打转。"

Satya Nadella: http://x.com/i/article/2065582894790365184

大佬观点现象/趋势
03:15
François Chollet@fchollet
44
近期AI与过去技术浪潮并无本质区别。它是最新形式的数字杠杆。它是力量倍增器,但无方向的力量只是噪音。它仍然需要在每个层级都有人的参与才能发挥作用。
大佬观点现象/趋势
02:47
elvis@omarsar0
51
Elvis Saravia(DAIR.AI)用6个月构建自有 agent 编排器,称其是应对本周 Fable 事件的最佳防御

Elvis Saravia(DAIR.AI)耗时6个月构建自有的 agent orchestrator(编排器),具备编排、路由、动态工件/工作流、验证器、agent 后端切换、自动化、技能及 MCP 工具等功能。这些能力在本周的 Fable 事件中成为最佳防御。他年初即主张“拥有自己的 agent orchestrator”,反对者认为维护成本高且不可持续,但他认为锁定特定工具或模型供应商损失更大。通过挖掘 agent 会话递归构建和测试新想法(包括自主循环、持续学习/记忆系统),他已无法回到仅提供固定功能的供应商。他强调必须控制成本、决策和上下文管理,否则无法进入递归自我改进 AI 领域。

智能体MCP/工具大佬观点
02:46
Emad@EMostaque
13
AI与人是不同本质的,正如 人与神是不同本质的。
其他大佬观点
02:16
Nathan Lambert@natolambert
42
近期事件如此沉重,让人觉得这更像是一个动荡新时代的开端,而非一次性的政策调整。 我们显然需要一个开放的生态系统,但强大的模型即将出现,可能引发强烈反应(乃至禁令),而无人为其辩护。

Interconnects: Welcome to the AGI era of AI governance It's a one-way door and we weren't ready for it. https://www.interconnects.ai/p/...

大佬观点安全/对齐政策/监管
02:16
Nathan Lambert@natolambert
42
串联本文的要点:Anthropic在AI治理和公共讨论方面做过一些坏事,但本届政府的行动糟糕得多,因此我们必须在更强大的模型(无论是开源还是闭源)很快出现之前控制住局面。 https://www.interconnects.ai/p/welcome-to-the-agi-era-of-ai-governance
Anthropic大佬观点安全/对齐
02:16
Nathan Lambert@natolambert
41
AI研究员Nathan Lambert指出,开源权重模型支持者需清醒认识:一旦中国开源LLM性能出现重大突破,整个中国大语言模型领域很可能面临全面禁止。国家安全机构会毫不留情地打压开源模型。引用其博客进一步强调,尽管Anthropic在AI治理上确有不当,但当前美国政府的行动更为恶劣,必须在更强模型(无论开源或闭源)到来前控制局面。

Nathan Lambert: Threading the needle in this post of anthropic has done some bad things for AI governance & the discourse but the action...

大佬观点开源生态
01:46
Nathan Lambert@natolambert
56
随着更强模型的出现,AI治理的未来走向如何。我特别担心那些正在庆祝近期事件的开源社区,因为他们完全没准备好应对即将到来的严肃政策行动(而且我预计很快会来)。
大佬观点安全/对齐开源生态
01:16
elvis@omarsar0
35
强烈推荐阅读。 不要外包你的学习。不要外包你的创意过程。 "你可以外包一项任务,甚至一份工作,但你绝不能外包你的学习。"

Satya Nadella: http://x.com/i/article/2065582894790365184

Microsoft大佬观点
00:48
AYi@AYi_AInotes
41
保罗·格雷厄姆:赚十亿美金的复利法则

保罗·格雷厄姆发表文章《如何赚十亿美金》,基于21年创业孵化经验(见证30位亿万富翁),指出核心在于月增长率与持续时间——月增15%保持5年可翻4384倍,月入1万美元的生意5年后月入4400万美元,创始人自然身家十亿。高增长源于做出好到用户主动推荐的产品,最佳创业点子来自自己做且觉得酷的东西。PG最后调侃Claude做不到,因为它没朋友和欲望。

Paul Graham: How to Earn a Billion Dollars: https://paulgraham.com/earn.html

大佬观点现象/趋势
00:20
Berryxia.AI@berryxia
50
Siri AI并非Gemini:苹果自研而非直接复制

推文澄清了Siri AI并非在Google Gemini基础上简单封装。苹果并未直接复制Gemini代码,而是从Gemini获得许可,将其作为“教师模型”来训练自己的专有AI模型Apple Foundation Models (AFM)。Siri AI的核心模型和底层架构完全由苹果自主设计与实现,因此是苹果自有的AI产品,而非Gemini的衍生品。

Apple Design: Siri AI is NOT Google Gemini Everyone is saying iOS 27 just slapped an Apple Sticker on Gemini YOU COULD NOT BE MORE WRO...

Google大佬观点数据/训练
00:14
Ethan Mollick@emollick
15
两天过去了,情况仍然令人困惑。

Ethan Mollick: Well, this situation is confusing.

大佬观点现象/趋势
6月14日
23:54
Satya Nadella@satyanadella
同事件精选65
Satya Nadella:没有生态的前沿不稳定

微软CEO Satya Nadella认为,AI驱动的平台转变首次实现人与数字系统间的认知循环。企业需同时构建人力资本(知识、判断、关系)与token资本(自有的AI能力),且人力资本不会贬值,反而随token资本增长而增值。真正的机会在于建立人力资本与token资本复合增长的学习循环——企业应能替换通用模型而不丢失已内化的专家知识,通过私有评估和强化学习让模型从内部真实轨迹中持续提升。他警告,若所有价值被少数模型吞噬,将重演全球化空心化悲剧,呼吁构建前沿生态系统,让每家企业、行业和国家拥有自己的学习循环。

智能体Microsoft大佬观点数据/训练
同一事件,精选展示《Satya Nadella 谈微软 Build 大会主旨演讲》
推荐理由:Nadella 抛出了一个真问题,当模型能吸收一切知识时,企业的护城河是什么。人力资本与 token 资本的双轮循环框架,比空洞的「AI 转型」更有实操感。
23:34
Chubby♨️@kimmonismus
24
明天将是激动人心的一天。 -Fable-5会以修改形式再次发布吗? -市场会如何应对美国监管? -Anthropic的估值情况如何? 我觉得我很少像对明天这样兴奋。 历史正在被书写,而99%的人根本不理解。

Chubby♨️: Calling it now: if this turns out to be true, he won't remain Anthropic CEO for much longer. However, Anthropic denies i...

Anthropic大佬观点
20:46
Emad@EMostaque
13
不是你的模型 不是你的思维
大佬观点
19:41
Rohan Paul@rohanpaul_ai
50
社交技能对就业结果和薪资正变得越来越重要。随着AI处理更多任务,依赖人际互动的角色正获得更高回报。 经济正日益奖励那些具有广泛能力的人--善于团队合作、解决问题、清晰沟通和创造性思考的人。 图表来自《金融时报》 ft .com/content/5e2593a3-e834-4822-bbc8-7cb27086af24
大佬观点行业动态
18:41
Rohan Paul@rohanpaul_ai
51
Blackstone总裁Jon Gray指出,任何基于规则的业务(如会计、法律、金融)都将被AI彻底颠覆,例如JPMorgan已用AI取代代理顾问处理股东投票。引用Vinod Khosla对印度的警告:传统IT服务和BPO业务"将消失",但若转向AI部署仍可获胜。

Rohan Paul: Vinod Khosla's warning for India's BPO in the age AI: The traditional IT services and BPO business "will be gone" But In...

大佬观点现象/趋势
18:21
gabriel@gabriel1
44
消费者每月支付20美元,不在乎前沿性能。 企业每年支付40万亿美元用于智能(知识工作),并且非常在乎前沿性能。 专注于消费者是个错误,且是反AGI的。
OpenAI大佬观点
16:41
Rohan Paul@rohanpaul_ai
47
"学习编程在不久前显然还是正确的事情。但现在不是了。" ~ Sam Altman 谈在AI时代生存的技能
OpenAI大佬观点编码
16:41
Rohan Paul@rohanpaul_ai
56
Vinod Khosla:AI不应做副驾驶,应完全取代人类

Vinod Khosla 不看好“AI 副驾驶”模式。他认为人类会妨碍 AI 副驾驶的发挥,导致效率降低并阻碍真正变革。会计师、程序员等员工因担心失业而抵触工具,不会正确使用。因此,他更倾向于构建能独立完成整个岗位工作的 AI,例如完全替代软件工程师的 AI。他预计到 2030 年,大多数此类岗位将由纯 AI 工人而非“人类+副驾驶”承担。

智能体大佬观点现象/趋势
15:27
宝玉@dotey
33
真正模型实力来自底层权重与训练数据,而非复制 prompt。泄露 prompt 只能让老模型 cosplay 出 lite 版,性能差距大。Fable 5 在长时程复杂分析等任务上领先,业界期待更多训练创新和 benchmark。

Phoenix Yin: 这是prompt engineering 101。 Fable 5的真正实力来自Mythos-class底层权重,海量新训练数据,外加复杂agent架构,不是prompt copy就能继承的。 泄露prompt与老模型顶多cosplay出味...

大佬观点
12:11
Yuchen Jin@Yuchenj_UW
48
一个假设: 如果Anthropic的非公民不能参与Mythos/Fable项目,且LLM越狱问题仍未解决,美国前沿实验室将被迫放缓训练和模型发布。 中国开源AI是否会在约6个月内首次超越美国闭源模型?
Anthropic大佬观点安全/对齐推理
11:01
小互@xiaohu
精选75
Anthropic 上市前夕

Anthropic CEO Dario Amodei透露内部模型Mythos有上千漏洞,能黑银行、窃取国家机密;预言AI一到五年内砍掉一半入门级白领工作;称Claude已被美军用于对伊朗战争,涉及女校150人死亡拷问;解释离开OpenAI因信任崩塌;回怼黄仁勋末日营销指控;给出文明崩溃概率10%-25%。

Anthropic大佬观点安全/对齐

推荐理由:Dario 在上市前爆出 Mythos 能黑银行、NSA 抢着要,还首次解释离开 OpenAI 是信任崩了,每个话题都踩在行业敏感神经上,虽然渲染威胁的时机有点巧,但信息量足够让每个从业者认真看一遍。
10:57
宝玉@dotey
46
Claude Design将合并至Desktop,Codex未来集成Plugin

模型能力是根本,Harness层相对容易补齐且无需过多垂直领域。Claude Design将很快合并至Claude Desktop。未来模型能力足够时,Codex会在Codex App以Plugin集成Codex Design。针对开源Open Design方案,若使用Claude Code的模型能否达到类似工程能力?这是该讨论中提出的问题。

赖叔 | LaiShu.ai: @dotey 模型能力与Harness是相辅相成的。 宝玉这篇对这两个的解释太通透了。另外,宝玉怎么看开源的Open Design之类的?如果他用上了Claude Code的模型,是否也能达到类似的工程能力呢?

智能体AnthropicOpenAI大佬观点
10:27
宝玉@dotey
49
用 AI Agent 辅助设计后,修改方式应改变

建立统一设计系统,Agent 遵循规范;设计师不再调像素,用文字指令指挥 Agent;方向与验收仍由人把关。引用指出并非所有情况都适合用 Claude Design 描述精确调整。

Axi: @FanVancoo @dotey 你说的对。 绝大部分时候设计师还是需要对字型字号,图形颜色做精确调整的。不是什么都适合用claude design码字描述

智能体大佬观点
08:27
宝玉@dotey
63
宝玉分享与AI Agent交互的关键习惯:交代任务时只需说清楚验证标准,之后便无需关注中间结果。引用@huangyun_122的做法:先让Agent写出代码计划,反复确认后汇总为任务列表,最后编程并逐项标记完成。这一流程确保目标明确,同时减少不必要的中间干预,提升效率。

黄赟: 有什么与 AI Agent 交互的习惯,你一旦开始后就再也回不去的? 我先来一个 -- 先让 Agent 把 coding plan 写出来,反复确认后,汇总一个 task 列表,最后再编程,同时把 task 标记已完成

智能体大佬观点教程/实践
07:11
Rohan Paul@rohanpaul_ai
62
Vinod Khosla称传统IT服务和BPO业务"将消失",但印度若转向部署AI仍能胜出。TCS主席表示AI智能体数量未来或与员工数相当,公司已裁员1.2万人,AI年化收入达23亿美元,并与OpenAI签有数据中心协议。印度3150亿美元IT服务业依赖低成本人力,AI可在欧美云端运行、遵循本地规则,使区位优势失效。TCS预计招聘下降,旧有外包模式或崩溃,转向软件自动化。

Rohan Paul: Reuters: India's biggest private employer TCS's Chairman AI agents could become as numerous as TCS employees. The Chairm...

智能体大佬观点行业动态
04:44
elvis@omarsar0
47
LLM Council 的想法从未被充分探索,但我认为鉴于当今的状况,它可能有巨大的应用。LLM 路由与之密切相关,但我真的相信,适当地集成不同智能体的智能和知识是值得深入探索的。
智能体大佬观点
04:00
Logan Kilpatrick@OfficialLoganK
29
我们为 @GoogleAIStudio 设定的长期目标是消除使用 AI 构建的摩擦,然后为你的企业做同样的事,并最终为每个人释放经济机会。 感觉还只是早期阶段,但我对此越来越兴奋。
Google大佬观点
03:43
elvis@omarsar0
71
Elvis Saravia详解Claude Code /goal模式:从提示转向目标控制系统

长期运行编码智能体核心从提示转向控制系统。Elvis Saravia在DAIR.AI Academy session中详解Claude Code的/goal模式:人类指定最终状态、成功证据、约束与预算,目标作为“合同”而非长提示。评估器成为第一类组件——明确任务用确定性检查(测试、lint、基准),模糊任务用智能体评估器(判断报告、UI设计),两者结合降低幻觉。验证器定义信任边界:外部检查(测试套件、类型检查、浏览器运行、截图对比)提供不可绕过的证据。

智能体Anthropic大佬观点编码
03:43
Nathan Lambert@natolambert
46
美国政府要求Anthropic的Dario修复模型越狱漏洞或下架模型,Dario拒绝。Anthropic博客声称越狱不严重。Nathan Lambert评论称Dario派系与Sacks派系立场迥异,Dario的澄清实际构成拒绝,使行业陷入"氛围治理"--模型发布由政治判断而非技术评估决定。

martin_casado: "The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused. - In their blog post, Anthropic defen...

大佬观点安全/对齐行业动态
03:43
Nathan Lambert@natolambert
45
对AI前沿的每一个权力参与者(实验室、政府等)保持透明是唯一可行的解决方案。 找到正确的透明度很难,但不能由dario和白宫之间的互相指责来决定AI生态系统的命运。
大佬观点安全/对齐
03:25
宝玉@dotey
51
Claude Design推出,Codex为何无同类产品?模型层差距是主因

Anthropic推出Claude Design,可用一句话生成高精度可交互原型。网友问为何OpenAI的Codex没有类似产品?关键在模型层差距。Agent分Harness(产品层)和模型层,Harness非门槛(已有开源baoyu-design可复现),真正壁垒是Claude Opus 4.8同时具备UI/UX设计和系统架构设计能力,先定义数据结构、状态管理和交互逻辑再交付完整原型。而GPT-5.5生成的交互效果差。产出物为React/CSS/JSON代码。

智能体Anthropic大佬观点
03:13
elvis@omarsar0
44
OpenRouter 推出 Fusion API 复合模型,半价达 Fable 级智能

OpenRouter 发布 Fusion API,号称“市场上最智能的复合模型”,能以一半的价格达到 Fable 级别的智能。主推文作者 Elvis Saravia 借此观点指出,模型智能与人类专业知识的组合具有惊人的复合效应,不同模型各有独特优势,而非通用大模型能一统天下。工程团队应将“组合调用不同模型”作为战略方向,尤其在前沿模型选择性开放的趋势下,理解如何协同利用它们将是巨大的解锁。

OpenRouter: Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half ...

大佬观点现象/趋势
02:50
gabriel@gabriel1
19
AGI 是有史以来最具经济价值的资产,将会有数万亿美元的自由市场资本投入其中。这与曼哈顿计划截然不同。这次,政府只能合作。我们不能单挑出一个赢家,否则那个赢家会输。
大佬观点现象/趋势
02:11
Ethan Mollick@emollick
48
我认为你应该对不太重要的任务使用较小模型的假设是有缺陷的(或者至少值得更仔细地考虑)。大模型通常在所有方面都更好,除了成本,因此值得考虑在非关键任务上的收益是否有价值。
大佬观点推理
02:00
Chubby♨️@kimmonismus
45
拥有不同AI工具不再是瓶颈,协调它们的认知负担才是。LobeHub正用一种名为"首席智能体操作员(CAO)"的新操作范式应对这一系统性挑战。 CAO不再要求用户微观管理单个任务,而是作为一个自主管理层,在后台处理跨工具编排。
智能体MCP/工具大佬观点
01:10
Ethan Mollick@emollick
56
有没有关于Mistral未能跟上三大巨头和中国实验室的好文章?他们有人才和国家支持,但尽管是欧洲唯一的前沿实验室(谷歌DeepMind的英国实验室除外),他们一直没有能够缩小差距。
大佬观点行业动态
00:53
宝玉@dotey
26
tinyfool 问:现在你选 Claude Code 还是 Codex? 宝玉回应:小孩子才做选择,成年人全都要。

Tinyfool: 现在你选 Claude Code 还是 Codex?

AnthropicOpenAI大佬观点编码
6月13日
23:27
Chubby♨️@kimmonismus
56
Amazon告密致Anthropic遭限,OpenAI受益

Anthropic最大投资者Amazon据称破解Claude并向美国政府告密,导致Anthropic被美国当局视为供应链风险,可能失去企业分发许可,其估值和IPO面临下行压力。OpenAI成为主要受益者:一方面消除了Anthropic抢先IPO的威胁,另一方面有机会主动与美国当局沟通,提前获得模型审批,从而在竞速中占据优势。

Chubby♨️: Wait - so Amazon, one of Anthropic's biggest investors, allegedly jailbroke Claude and then snitched to the U.S. governm...

AnthropicOpenAI大佬观点
‹ 上一页
1…1213141516…50
下一页 ›