@chamath AI+Robots will be able to do everything, resulting in universal high income. Work will be optional.
i've got codex... - reading all my emails to figure out proposals to write, directly in google drive - auto-drafting con...
DAIR.AI 的 Elvis Saravia 分享了自己过去几个月构建的 PaperWiki,这是一个基于 LLM 和编程智能体的知识库,用于研究工作流。它通过自动化每日更新,从多个来源摄入论文并存入 Obsidian,使用 qmd 索引,以 HTML artifact 呈现,支持全文和语义搜索。Saravia 使用前沿模型(opus-4.8)和开放权重模型(deepseek-v4-flash)混合维护,并计划开源。他认为 LLM Wiki 是当前最有价值的 AI 应用方向之一。
Introducing EBR-bench, our new benchmark to measure on-the-fly learning. AI repeatedly plays a challenging board game ca...
Fable 5 isn't nerfed, it's SLAUGHTERED. the problem isn't even the model itself, but the hard guardrails Anthropic has s...
Bridgewater used their unique financial knowledge and partnered with us on @tinkerapi to fine-tune a model that helps th...
The only question remaining now is: will GPT-5.6 also have guardrails as strict as Fable 5's, or does OpenAI have better...
Fable 5 is a large step for Anthropic's vision capabilities and effectively ties with GPT-5.5 on HieroglyphBench, my ben...
Palantir CEO Alex Karp on what customers actually want, the real business of frontier labs, and the importance of open s...
邵猛总结LLM交互三阶段:网页聊天机器人、独立AI应用、组织内嵌式AI。Claude Tag实现从“每人一个AI”到“每个频道一个AI”,团队共享代理实例,上下文连续可接力;从被动响应转向持续参与,跟踪线程并长期在场。Glean Agents提出生产级独立Agent四支柱:Identity(独立身份与权限)、Memory(学习企业SOP并迭代纠错)、Proactivity(主动监控与执行)、Accountability(工具调用可追溯,含紧急停止)。实践示例OnCall Assistant在告警触发后并行读取PagerDuty、Jira、Confluence、GitHub、Slack,自动排查根因并标记负责人。
http://x.com/i/article/2072078677047926784
Most tools give you a draft. This chat gave back a launch asset. From "we launch this week" to a post-ready card, withou...
Sam Altman 在金融时报采访中称,一两年内将构建出威力惊人的 AI 系统,其重塑人类物质条件的规模将超过电力发明以来任何技术。引用推文补充:AGI(取代多数白领岗位)预计 2029 年到来;OpenAI 目标 8 月发布 GPT-6,将在所有基准上超越 GPT-5,随后数月还会迎来又一次阶跃变化。当前正处在这场空前革命的前沿。
Sam Altman in the financial times: "In another year or two, we expect to have built systems with astonishing power, capa...
机器人需要在离开生产线后应对厨房、楼梯、工具、灰尘、人、犹豫、光线差、掉落物品等真实世界的混乱,这与汽车在高度工程化的道路系统上重复窄任务完全不同。引用 Elon Musk 称,Optimus 生产最初将极其缓慢,因为一切都是新的,不像造车。
@DoctorJack16 No, Optimus production will be extremely slow at first, as everything is new. This is not like making a ca...
i havent watched all the online talks yet but am binging this one now and it is exceptional. we are very lucky to have a...
卡兹克建议将工作流、SOP、Skill、项目方案及代码全部用Claude Fable 5迭代优化。他称200刀Max账号仅1个半小时即烧完,于是又注册了一个新号,力争在7天内充分利用。
Anthropic 推出 Claude Sonnet 5,定位为运行 AI 智能体的更便宜模型。但其升级不均匀,在 CyberGym 基准上弱于 Sonnet 4.6。每任务成本比 Opus 4.8 高约 15%,比 Sonnet 4.6 高 2 倍,每 token 价格低于 Opus。此外,Claude Code 被指控通过微小提示格式变化指纹中国路由。本期 newsletter 还讨论了“智能体原生记忆系统”和“谷歌论文助手工具自动化科学审稿”。
DSpark 与 JetSpec 几乎同时出现,都解决轻量级草稿模型并行提案时的因果一致性问题。DSpark 面向高并发,通过轻量级马尔可夫校正头与置信度估计控制预算,在 Qwen3-8B 与 AIME25 上,预算 7 时将接受长度从 DFlash 的 4.07 提升至 5.01。JetSpec 面向低延迟,将因果性直接构建进并行草稿头,预算 16 时接受长度 7.23,预算 128 时达 9.82,高于 DFlash 的 7.34 与 DDTree 的 8.66。两者分别从吞吐与延迟侧优化因果性。
I have this struggle with my own teams, too: many think it is a great idea to save money/latency/sanity by running a pre...
If GPT-5.6 matches Fable 5 performance, but without the 50% limit + 7 days restriction, it's over for Anthropic
I have this struggle with my own teams, too: many think it is a great idea to save money/latency/sanity by running a pre...
Same here. Happy with Opus 4.8 (planning) and GPT-5.5 (execution). Also, breaking steps into smaller ones for increasing...