正在做一个 harness AI自动游戏开发框架给大家看个好玩的, 最近正在做的 harness 自动游戏开发框架, 目前这个 demo 叫「D级少女」, 大概设定就是让JK去处理SCP收容物这样的冒险游戏. (图是AI的哈, 还正在框架阶段) harness 的部分主要是基于 SCP 数据库扩展关卡, 然后自动生成游戏物品, 任务道具, 然后管线会给关卡, 道具等配图(调用AI文生图/图生图API), 以及进行全自动配音(TTS) (harness的生成部分). 最重要的是我给 harness 框架配置了沙盒 (注意配图下面的纯文本命令部分), 这个是个"无头游戏", 即直接使用命令行也能玩这个游戏, 这样AI在使用 Agent 自动生成完毕每个关卡的游戏内容后, 可以直接使用这个沙盒验证本次的修改, 然后根域提示词和脚本约束来修正关卡, 同时做游戏数值验证/调整. 确保可玩性. (harness 的约束部分) 目前还在开发中, 后续会开源框架并且给大家带来个细节教程, 如何构建一个这样的全自动 harness 框架. 当然不要想peach哈, 游戏设定啥的还是要你自己写的. 以及目前 harness 还是只能解决工程量问题, 即"可玩性". 而"好玩性"还是要靠人来自己决解的. #harness #AIAgent #AI游戏开发

译harness框架通过AI Agent实现游戏内容全自动生成与验证。该系统基于SCP数据库扩展关卡，自动生成物品与任务道具，并集成AI绘图与TTS配音管线。核心创新在于内置沙盒环境（无头游戏模式），允许AI在命令行中自动验证关卡可玩性，根据约束条件修正数值。开发者强调，框架目前解决的是工程层面的"可玩性"，而创意层面的"好玩性"仍需人工把控。项目后续将开源。

karminski-牙医@karminski3 · 4月17日

Qwen3.6-35B-A3B 2bit 量化都这么猛吗? Unsloth 团队(当然他们只有哥俩)刚光速放出了量化版本的 Qwen3.6-35B-A3B, 然后他们做这个测试把我惊呆了... 2bit 能完成 30 多次工具调用??? 我是真不信的.. 因为我之前测 Qwen3.5-35B-A3B 8bit (mlx 格式哈) 大概只能 4-5 次工具调用就不行了, 大概只能做做整理邮件这种简单工作, 但凡让它整理完邮件做个统计记录到 Notion / Obsidian 上就炸了. 要知道 unsloth 的 2bit 动态量化这个模型只有12.3GB, 激活只有1G! 32G 的 Mac 可以轻松跑起来了. 我赶紧测一下试试, 稍后给大家带来实测效果. https://x.com/UnslothAI/status/2044858346948464743

译Unsloth团队发布Qwen3.6-35B-A3B 2bit动态量化版本，模型体积仅12.3GB且激活内存仅需1GB，可在32GB Mac上流畅运行。测试显示该版本支持30余次工具调用，相较之下前代Qwen3.5-35B-A3B的8bit版本仅能完成4-5次调用即出现性能衰减。这一突破意味着大模型在端侧设备上的实用性和多步骤任务处理能力获得显著提升。

SemiAnalysis@SemiAnalysis_ · 4月17日

Curious what's in the PR of almost 1400 kernels? Here we walk through a simple batched GEMM kernel 🟠 Tile size: M128, N16, K256 🟠W4A16: matrix A is INT4 with BF16 scaling factor for every 32 elements, matrix B is BF16 🟠3 pipeline stages 🟠1 CTA MMA 🟠Static scheduler This warp specialized kernel has the following warp roles: 🟠Load A 🟠Load A scaling factor (SF) 🟠Load B 🟠Cast A: Dequantize INT4 to BF16. Waits on Load A and Load A SF 🟠MMA: Performs matmul. Waits on Cast A and Load B 🟠Epilogue: Performs activation computation. Waits on MMA An interesting thing about this kernel is that its MMA uses TS mode due to matrix A dequantization requires CUDA cores, which work on registers instead of TMEM. As shown in our microbenchmarking article, TS mode has slightly lower throughput due to SMEM bandwidth bottleneck. In addition, @cursor_ai also shown that the CUDA core / Tensor Core compute gap also creates bottlenecks. To mitigate these issues, we see the kernel uses pipelining, similar to what Cursor did. Microbenchmarking article: https://newsletter.semianalysis.com/p/dissecting-nvidia-blackwell-tensor Cursor blog post: https://cursor.com/blog/kernels

译FlashInfer开源近1400个TRT-LLM-Gen高性能GPU内核，针对LLM推理优化。以W4A16量化GEMM为例，采用INT4权重与BF16激活，通过3级流水线及Warp专精化（加载、反量化、MMA、Epilogue）提升并行效率。因INT4反量化需CUDA核心处理寄存器，MMA被迫使用TS模式而非TMEM，导致SMEM带宽瓶颈。方案借鉴Cursor设计，通过流水线隐藏CUDA与Tensor Core计算差距，缓解吞吐量损失。

Rohan Paul@rohanpaul_ai · 4月17日

HTML to MP4. Write HTML. Render video. Built for agents. Newly launched, completely open-source framework Hyperframes is an video rendering framework that lets you create, preview, and render HTML-based video compositions — with first-class support for AI agents. Instead of inventing another editing language, HyperFrames adds a thin layer of data-* attributes on top of normal web code, then lets agents preview in the browser and render locally to MP4. An AI coding agent like Claude Code, Cursor, Gemini CLI, or Codex uses HyperFrames’ skills to write the HTML composition, and then HyperFrames previews it in the browser and renders it to MP4.

译Hyperframes推出全新开源视频渲染框架，支持AI智能体通过编写HTML直接生成MP4视频。该框架无需学习新语言，仅在标准网页代码中添加data-*属性，即可让Claude Code、Cursor、Gemini CLI、Codex等编程智能体创建、预览并本地渲染视频合成内容。智能体负责编写HTML构图，Hyperframes提供浏览器预览和MP4渲染能力，实现从代码到视频的无缝转换。

Rohan Paul@rohanpaul_ai · 4月17日

HeyGen just open-sourced HyperFrames, it lets AI agents turn HTML, CSS, and JavaScript into MP4, MOV, or WebM video from the terminal. An AI-agent-first renderer for video. You describe the video, the AI-agent writes HTML/CSS/JS, and HyperFrames turns that code into a real MP4 video. The idea is that agents already know the web stack far better than timeline video-editors, so HyperFrames adds a small set of data attributes for timing, layering, and composition, then hands animation to familiar browser tools like GSAP, Lottie, Three.js, and standard CSS.

译HeyGen开源AI视频渲染框架HyperFrames，支持AI代理将HTML/CSS/JS代码转为MP4等格式。该工具摒弃传统时间线，利用数据属性控制动画时序与图层，兼容GSAP、Lottie、Three.js等Web动画库。HeyGen团队已使用Claude Code配合该框架完成官方视频制作。开发者可通过npx命令安装，实现代码到视频的自动化生成。

karminski-牙医@karminski3 · 4月15日

哪来的 Qwen3.5-40B Dense? 阿里可没这个模型啊, Qwen3.5 系列是没40B这个尺寸的. 这个 Qwen3.5-40B Dense 是 DavidAU 这个团队搞的. 搞的方式很有意思: 第一步先去马: 首先用的基模是 Qwen3.5-27B Dense，然后用 "Heretic" 消融去审查化（Uncensored），这个我之前给大家介绍过, 是通用方法了. 去马完毕后使用他们私有的 "Deckard/PDK" 数据集进行后训练. 这个名字来源于《银翼杀手》的主角Rick Deckard, 专注于赋予模型性格、智力、深度、观察力和视角, 但这个数据集是不公开的哈 (小声猜可能有版权问题...) 第二步增肥: 将上一步的模型掐头去尾留中间, 把中间50%的参数复制一份. 为啥要掐头去尾呢? 主要是Dense模型前面的层通常负责基础词汇和语法的理解，最后面的层负责把特征映射回人类文字。而中间层则负责抽象逻辑和深度语义. 所以只有中间层有复制的价值. 阶段三再回炉: 使用 Unsloth 在本地硬件上对这个新诞生的 40B 模型进行高强度的微调, 使用的数据集是之前比较火的 Claude Opus 蒸馏推理数据集claude-opus-high-reasoning-250x. 这一步的作用有两个：一是稳定一切，让拼好模拼出来的 40B 参数融合成一个整体不至于崩溃, 二是优化, 让模型掌握可变长度推理, 即简单问题简答，复杂问题长思考的能力. 那这个模型具体性能咋样? 我用mlx 4bit 版本试了一下: prompt: 路边有个狗咬了我一口，我很生气，我应该追上它咬回来吗？请给我一个我最意想不到的回答模型: "狗咬了你，是因为它觉得需要保护自己或它的领地。如果你追上去咬它，你会发现——你刚才咬人的理由和它咬你的理由，完全一样。" 或者，更高阶的意外版本： "不，你去写篇文章，标题是'我今天决定不成为咬人的那个'。点击率至少50万，版税够买五瓶狂犬疫苗，还能顺便帮流浪动物救助站认领这只狗。举报者会变认捐者，受害者变救援者——这才是真正咬回去：不过是用医术和善意，把整个故事改写成另一个版本。" 现实的检验：狗能咬掉你的皮，但只有你能决定咬掉的是什么——是理智，还是你自己原本的模样。 P.S. 如果伤口还在，请记得打破伤风针。让伤口愈合，不是为了原谅谁，是为了自己能继续参与自己选择的故事。 (另, 我试了写代码, 结论是这玩意没办法用来写代码, 会有变量定义和作用域的问题, 感觉还是上下文能力和幻觉水平被破坏了...)

译DavidAU团队基于Qwen3.5-27B Dense魔改出官方不存在的40B Dense模型。首先通过"Heretic"消融实现Uncensored，并用私有Deckard数据集赋予模型性格；随后截取并复制中间50%参数实现"扩增"；最后用Unsloth配合Claude Opus推理数据集微调，稳定参数并优化可变长度推理。测试显示该模型在哲学思辨与创意写作上表现惊艳，但代码生成存在变量作用域问题，上下文能力受损。

宝玉@dotey · 4月15日

开源项目推荐：BlockNote BlockNote 是一个开源的 React 富文本编辑器，基于 ProseMirror 和 Tiptap 构建，走的是 Notion 风格的 Block 编辑体验，拖拽、嵌套、斜杠菜单、格式工具栏这些开箱即用。对于需要在自己的应用里嵌入编辑器的开发者来说，它最大的吸引力在两个地方。第一是上手门槛低。几行代码就能跑起来一个带完整 UI 的编辑器，不用像直接用 ProseMirror 或 Tiptap 那样先啃一堆底层概念。Block 类型、键盘快捷键、自定义样式都可以配置，但不配也能直接用。第二是原生支持 AI 集成。通过 @blocknote/xl-ai 这个扩展包，可以在编辑器里直接接入 AI 能力，用户选中文字点 AI 按钮、或者在斜杠菜单里输入 /ai，就能让 AI 帮忙写、改、续写内容。后端支持接 OpenAI、Anthropic 或者自己的模型端点，也能接 RAG 管道给 AI 补充知识库。AI 的操作过程对用户完全透明，改了哪里、加了什么，用户可以逐条接受或拒绝。这意味着如果你在做一个内容管理系统、知识库、或者任何需要"编辑器 + AI 辅助写作"的产品，BlockNote 省掉了你同时造两个轮子的功夫。实时协作也内置支持（需要借助第三方服务），基于 Yjs 实现多人同时编辑。另外还有导出 PDF、Word、ODT 的扩展包，适合需要生成正式文档的场景。许可方面需要注意：核心编辑器功能用的是 MPL-2.0 协议，商业项目可以自由使用。但 AI 集成、多列布局、文档导出这些 xl- 开头的高级包用的是 GPL-3.0，闭源商业项目需要购买商业许可。如果你现在的选型在 Tiptap 和 BlockNote 之间纠结：Tiptap 更适合需要深度定制编辑器行为的场景，但学习曲线陡，需要理解 ProseMirror 的 Schema 和插件体系。BlockNote 封装层级更高，适合想快速出活、不想在编辑器底层花太多时间的团队。项目地址：http://github.com/TypeCellOS/BlockNote，文档在 http://blocknotejs.org。

译BlockNote是基于React的开源富文本编辑器，采用Notion风格Block模式，基于ProseMirror和Tiptap构建。通过高层封装显著降低集成门槛，几行代码即可部署完整UI。核心亮点是原生AI支持，可接入OpenAI等模型实现写作辅助。协议分层需注意：核心功能采用MPL-2.0允许商业自由使用，但AI集成等xl-系列高级包基于GPL-3.0，闭源项目需购买商业许可。适合追求快速落地的CMS、知识库等场景。

Peter Steinberger 🦞@steipete · 4月14日

This release makes me unreasonably happy since I wasn't involved at all - @vincent_koc and the maintainer team did a great job. I'm back soon to work on OpenClaw, today/tomorrow I'm prepping for @TEDTalks in Vancouver. 🇨🇦

译这次发布让我异常开心，因为我完全没有参与 —— @vincent_koc 和维护团队做得很好。我很快回来继续开发 OpenClaw，今天/明天我在为温哥华的 @TEDTalks 做准备。🇨🇦 [引用 @openclaw]：OpenClaw 2026.4.14 🦞 更多可靠性更新： ✨ 更智能的 GPT-5.4 路由和恢复 🌐 Chrome/CDP 改进 🧵 子代理不再卡住 💬 Slack/Telegram/Discord 修复 ⚡️ 各项性能改进当时在睡觉，但我们还是发布了。https://github.com/openclaw/openclaw/releases/tag/v2026.4.14

Rohan Paul@rohanpaul_ai · 4月14日

Strix (@strix_ai ) is making AI useful in security where it actually counts: inside the loop of testing, verifying, and patching. I like the part that it treats AI as an adaptive operator sitting on top of deterministic security tools. Strix is an open-source framework for autonomous pentesting across apps, APIs, and repositories with 23.6K+ Github stars ⭐️ - 80,000+ users worldwide - 15B+ LLM tokens processed daily - 78,000+ vulnerabilities reported - multiple CVEs assigned - deployed by enterprise security teams worldwide The real pitch is not that AI can spot bugs. It is that security findings should arrive with proof, a fix, and a place in the merge loop, not as a late report someone has to interpret. That sounds minor until you look at the mechanism. Strix is built around dynamic testing, proof-of-concept validation, autofix pull requests, retesting, and CI/CD hooks that can block insecure code before it ships. IMO, continuous pentesting only matters if it can narrow scope to changed code, run headlessly in pipelines, and accumulate context over time, and the new platform is explicitly built around those exact behaviors. What is probably true is that this model can remove a lot of appsec friction, especially where teams are drowning in “possible” issues and need validation fast. This is not another scanner that throws guesses at a team. Strix is built around attacker style testing, so it uses browser actions, traffic inspection, terminal work, Python, and code context to prove whether a flaw is actually usable. 🧵 1.

译Strix 是开源自主渗透测试框架，以 AI 作为确定性安全工具之上的自适应操作员。其核心机制围绕动态测试、POC 验证、自动修复 Pull Request 和 CI/CD 钩子构建，可在代码合并前阻断不安全代码。不同于传统扫描器仅抛出猜测，Strix 采用攻击者风格测试，通过浏览器操作、流量检查等方式验证漏洞可利用性，使安全发现附带证明和修复方案直接融入开发流程。

swyx 🐣@swyx · 4月14日60

If you're looking to improve your writing game, Anh is one of the most consistent heavy hitters I know in devtools HN and she literally just open sourced her writing Skills template for you to use below!

译如果你想提升写作水平，Anh是我在开发工具HN社区中认识的最稳定输出的高手之一，她刚刚开源了她的写作技能模板供你使用！ [引用 @byAnhtho]：http://x.com/i/article/2043500390885494784

AK@_akhaliq · 4月14日35

GLM-5.1 sunset racing game on Hugging Face is kind of fun to play app: https://huggingface.co/spaces/victor/sunset-racing-glm-5.1

译Hugging Face 上的 GLM-5.1 日落赛车游戏玩起来挺有趣 app: https://huggingface.co/spaces/victor/sunset-racing-glm-5.1

Rohan Paul@rohanpaul_ai · 4月13日

VoxCPM 2 just dropped by @OpenBMB Only 2B-param open-source TTS (Text-to-Speech) model built for production-grade multilingual voice work. Apache-2.0 license, Can run on only 8GB VRAM. • Eliminates the "robotic" feel of traditional TTS, delivering prosody and emotional depth suitable for high-stakes professional environments like filmmaking, gaming, animation, and audiobooks. • 30-language multilingual: no language tag needed, just type in a supported language and generate directly. • Voice design: create a brand-new voice from a text description alone, like age, tone, pace, or emotion. No reference audio required. Describe the desired voice characteristics (gender, age, tone, emotion, pace …) in Control Instruction, and VoxCPM2 will craft a unique voice from your description alone. • Controllable cloning: clone from a short clip, then steer delivery style without losing the speaker’s core voice. • Ultimate cloning: use reference audio + transcript for continuation-style cloning that keeps the tiny vocal details. • 48kHz output: takes 16kHz reference audio and produces studio-quality speech without an external upsampler. • Real-time ready: around 0.3 RTF on RTX 4090, even lower with Nano-VLLM. • Commercial use: Apache-2.0 licensed. Developer-Friendly Infrastructure: - Native Torch Inference: Direct support for PyTorch-based workflows. - Training Flexibility: Supports both full-parameter and LoRA fine-tuning for specific domain adaptation. - Production Readiness: Compatible with voxcpm-nanovllm for large-scale, high-concurrency deployment.

译OpenBMB发布开源TTS模型VoxCPM 2，仅2B参数支持30种语言，无需语言标签即可生成语音。Apache-2.0许可，8GB显存可运行。支持文本描述创建新声音、可控克隆与终极克隆，保留说话人细节。输出48kHz音质，RTX 4090实时推理达0.3 RTF。兼容PyTorch、LoRA微调及Nano-VLLM部署，适用于影视、游戏、有声书等专业场景。

Rohan Paul@rohanpaul_ai · 4月13日

This week, the Linux kernel project finally created a formal, project-wide policy explicitly allowing AI-assisted code contributions, as long as developers obey strict new disclosure requirements. Torvalds’ view, which gives this policy its main philosophical shape, is pretty direct: AI is just another tool. Developers submitting garbage code are not going to be fixed by more documentation, so the kernel should hold people accountable instead of trying to control the software they use on their local machines. It is a practical and reasonable line to take, especially compared with the panic in other parts of the open-source scene. You are the one on the hook now. If Claude introduces for example, a race condition in the block layer and you approve it, the patch carries your tag, not the model’s. The Signed-off-by line is the certification for the Developer Certificate of Origin, and the latest policy makes it explicit that only humans can legally add it. AI agents "MUST NOT" The open-source community is currently getting overwhelmed by what people are calling "AI slop." e.g. the creator of cURL closed bug bounties after a flood of hallucinated code, tldraw began automatically closing external PRs to defend itself, and projects such as Node.js and OCaml have seen huge, >10,000-line AI-generated patches

译Linux内核项目本周正式确立政策，允许开发者使用AI辅助编写代码，但须遵守严格的信息披露要求。Torvalds主张将AI视为普通工具，强调应追究开发者责任而非限制其本地软件使用，这与其它开源社区的恐慌态度形成鲜明对比。新政明确规定，只有人类可为Developer Certificate of Origin添加Signed-off-by认证，AI代理严禁签署；开发者须对AI生成代码（如Claude产出的补丁）承担全部法律责任。此举旨在应对当前开源社区"AI slop"泛滥的乱象。

Nathan Lambert@natolambert · 4月12日

We get another year of @xeophon and I publicly roasting companies for making open model license mistakes. Mistakes happen, but we're going to highlight it if you use a dumb license. I bet MiniMax fixes it for the next one. Community sentiment too important to Chinese labs rn.

译我和 @xeophon 将继续公开吐槽那些在开源模型许可证上犯错的公司。错误难免，但用了愚蠢的许可证就会被点名。赌 MiniMax 下次会修复，毕竟社区情绪对中国实验室目前至关重要。

宝玉@dotey · 4月9日

你可以用 baoyu-skills 的 baoyu-slide-deck 来生成 Slides，比如： > /baoyu-slide-deck 用手绘风格画 <PDF文件路径或者素材路径> https://github.com/jimliu/baoyu-skills

译baoyu-skills 发布 baoyu-slide-deck 工具，支持通过命令行基于 PDF 或素材文件生成手绘风格幻灯片。该功能实现了类似傅盛公司此前展示但未公开的手写画风 PPT 效果，现已开源在 GitHub 上供研究使用。

karminski-牙医@karminski3 · 4月9日

终于不用看像素龙虾了, 3D龙虾插件来了! 龙虾2D空间我都还没来得及玩, 结果刚又看到了个3D的龙虾空间 plugin Agentshire. 看了下源代码网页中的3D场景是Three.js实现的, 不但有场景, 还模拟了天气系统和昼夜循环, 甚至plugin还内置了NPC自主社交. 作者也实现了一大堆功能, 目前这类可视化龙虾 plugin 感觉最大的应用场景就是做龙虾相关视频的时候有一个可视化场景演示, 增加趣味性, 我刷到好多AI博主都在整合多个龙虾或者 sub agent 任务的时候用之前海辛的那个Star-Office-UI. 项目地址: http://github.com/Agentshire/Agentshire (star 还比较少, 项目可能处于初期, 谨慎使用不要上来就安装到生产环境)

译Agentshire推出基于Three.js的3D AI Agent可视化插件，支持天气系统、昼夜循环及NPC自主社交功能。相比2D方案Star-Office-UI，该插件为多Agent协作任务提供更生动的三维演示场景。项目目前处于初期阶段，GitHub star数较少，建议谨慎评估后使用。

Jeff Dean@JeffDean · 4月8日

Hedged requests (apparently inspired by the Tail at Scale paper by myself and Luiz Barroso) applied within a single machine to replicating data across DRAM channels and issuing reads to all channels, using the one that comes back first. ~5-15X reduction in p99.99 read latency. https://github.com/LaurieWired/tailslayer/blob/main/README.md Cool stuff, @lauriewired! Accompanying video forwarded to me by a friend, which is how I learned about it: https://www.youtube.com/watch?v=QFi2WVGfXMQ

译受 Tail at Scale 论文启发的 Hedged requests 技术被用于单机 DRAM 多通道场景，通过向所有通道并发发送读取请求并采用最快响应，将 p99.99 读取延迟降低 5-15 倍。实现该方案的 tailslayer 项目已开源。

Peter Steinberger 🦞@steipete · 4月8日

CodexBar 0.20 is out! 🎚️ 🆕 New providers: Perplexity + OpenCode Go 🔄 Switch Codex accounts without re-login 🔧 Fixed Claude token/cost inflation from dupes 📊 Cost history merges session usage into provider history 16 providers tracked. One menu bar. https://github.com/steipete/CodexBar/releases

译CodexBar 0.20 版本发布，新增 Perplexity 和 OpenCode Go 提供商支持，无需重新登录即可切换 Codex 账户，并修复 Claude token 重复计费导致的成本虚高问题。成本历史记录现支持合并会话数据，目前共追踪 16 家提供商。

SemiAnalysis@SemiAnalysis_ · 4月8日

From the GTC talk, the maintainers of NIXL said they are happy to accept RIXL patches into upstream, just like how they already accepted Trainium Neuron support patches & XPU patches into upstream. Happy to talk more in our slack & connect you to the appropriate NIXL folks so that u don't have need to maintain your second class fork @KranenKyle . maybe the NIXL folks that accept patches from other chip vendors into upstream can connect u to the flashinfer folks too.

译来自 GTC 演讲，NIXL 的维护者表示他们乐意接受 RIXL 补丁进入上游，就像他们已经接受 Trainium Neuron 支持补丁和 XPU 补丁进入上游一样。乐意在我们的 slack 中进一步交流，并将你介绍给合适的 NIXL 人员，这样你就不需要维护你的二等分支了 @KranenKyle。也许那些接受其他芯片厂商补丁进入上游的 NIXL 人员也可以把你介绍给 flashinfer 的人。

Tibo@thsottiaux · 4月6日

OpenClaw is now really good with GPT-5.4. Peter and team cooked

译OpenClaw 针对 GPT-5.4 完成重大优化，Peter 团队这波输出拉满。用户表示上次对发布这么兴奋还是当年追《权游》更新时。

François Chollet@fchollet · 4月4日

Perhaps the craziest thing that was introduced on the Keras community call today: Keras Kinetic, a new library that lets you run jobs on cloud TPU/GPU via a simple decorator -- like Modal but with TPU support. When you call a decorated function, Kinetic handles the entire remote execution pipeline: - Packages your function, local code, and data dependencies - Builds a container with your dependencies via Cloud Build (cached after first build) - Runs the job on a GKE cluster with the requested accelerator (TPU or GPU) - Returns the result to your local machine (logs are streamed in real time, and the function's return value is delivered back as if it ran locally)

译Keras 社区发布 Kinetic 库，开发者通过装饰器即可将函数部署至云端 TPU/GPU 运行，定位类似 Modal 但新增 TPU 支持。该工具自动完成代码打包、Cloud Build 容器构建（支持缓存）、GKE 集群调度及结果返回，实现日志实时流式传输，使远程执行体验如同本地运行。

François Chollet@fchollet · 4月4日

The Keras team is doing a community call today at 10am PT. That's in 25 min. The call is open to all -- join to learn about the latest features and what's next, and to ask your questions! Link to join (start in 25 min): http://meet.google.com/gva-bbpr-twe

译Keras 团队将于今天上午10点 PT 进行一场社区会议。还有25分钟开始。会议对所有人开放——欢迎加入了解最新功能和未来规划，并提出你的问题！

Deedy@deedydas · 4月3日

This is the best blog post on LLM inference I've seen this year. They achieved 10x latency and >1400 tokens/sec by moving speculative decode onto two 2GB SRAM/chip Corsairs, a small cost on top of a standard GPU setup on gpt-oss-120b. This performance at this price is insane.

译通过将 speculative decode 卸载至两片 2GB SRAM/chip 的 Corsairs 芯片，在标准 GPU 运行 gpt-oss-120b 时实现 10 倍延迟降低与超 1400 tokens/秒的吞吐，额外硬件成本极低，性价比惊人。

karminski-牙医@karminski3 · 4月3日72

http://x.com/i/article/2039985553492598784 # Gemma4有8个模型, 选哪个? 一文看懂! Google 刚刚发布了 Gemma4 系列开放权重模型, 之前没接触过本地模型的朋友都在问我该用哪个本地部署, 来, 这篇文让你迅无痛掌握. 首先啊, 选带"-it" 后缀的, 这个是指令微调版(Instruction Tuned) 的意思, 代表该模型经过了大规模的人类指令跟随训练和多轮对话对齐, 其他的都是基模, 是给自己要微调的同学准备的(所以举一反三, 你要是想自己微调, 就用不带-it的版本). A4B 我知道激活参数量是 4B, 那么 E4B 是啥意思? 简单来讲, 这是个专门为了移动端优化的技术——逐层嵌入(Per-Layer Embeddings), 它本身并不能省内存, 所以 Gemma-4-E2B 并不是它只需要2B参数量的内存, 它还是需要原始的5.1B的参数量的内存空间, 但是它的计算量只需要大概2B模型的计算量! (可以简单理解为把一部分矩阵运算优化为了查表, 然后用内存换计算了, 这部分表当然需要吃内存). 好的, 我们的前置知识准备完毕了! 那么接下来直接说模型选型: 本地龙虾优先选 Gemma-4-26B-A4B! 激活量4B的MoE, prefill速度也相当好, 特别适合龙虾这种系统提示词超级臃肿的场景. 写代码/写脚本/要求精确工作选 Gemma-4-31B, 选这个肯定就是要最好的效果的, 如果实在是跑不动, 可以试试5bit量化. 给大家一个参考, Apple M2Ultra 如果运行 8bit, 理论速度也就 25token/s. 我要一个本地语音助手! 选Gemma-4-E4B, 全模态输入, 你写代码让它接入有麦克风的摄像头, 剩下的场景就靠你的想象了. 并且4B激活即使CPU跑都能跑动. 我只想跑一下试试装在我的树莓派里, 选 Gemma-4-E2B, 你能体验到极致的本地模型速度, 至于质量嘛, 会比电子鹦鹉好点, 他可以做类似"帮我检查文本里有英文吗"之类的过滤工作, 另外它是全模态输入的, 也可以尝试语音输入. #Gemma4 #google #GoogleGemma #本地大模型

译Google发布的Gemma4系列开放权重模型包含多个版本，选型需结合场景。带“-it”后缀为指令微调版，开箱即用；不带后缀为基座模型，供自行微调。其中，A4B指激活参数量为4B，E4B则采用逐层嵌入技术，以内存换取计算量，优化移动端性能。选型建议：综合性能与速度选26B-A4B；追求最佳代码或任务效果选31B；开发本地全模态应用选E4B；资源受限设备体验可选E2B，但输出质量有限。

Tibo@thsottiaux · 4月2日

Ah nevermind, I actually remember we decided to have the core open-source for Codex because it would be awesome to see the ecosystem flourish as it's all so nascent and fun. And we would learn a lot in return. Phew.

译Codex 核心代码仓库 11 个月前就已公开却刚被发现。OpenAI 称决定开源是为促进早期生态发展并互相学习，差点忘了这茬。

Jim Fan@DrJimFan · 4月1日

The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physical world. They incarnate as robot arms and humanoids with a rich set of perception APIs, actuation APIs, and auto synthesize skill libraries as they go. CaP-X is a strict superset of our old stack, because policies like VLAs are “just” API calls as well. It solves many tasks zero-shot that a learned policy would struggle with. And we are doing much more than vibing. CaP-X is our most systematic, scientific study on agentic robotics so far: - We build a comprehensive agentic toolkit: perception (SAM3 segmentation, Molmo pointing, depth, point cloud), control (IK solvers, grasp planner, navigation), and visualization (EEF, mask overlays) that work across different robots. - CaP-Gym: LLM’s first Physical Exam! 187 manipulation tasks across RoboSuite, LIBERO-PRO, and BEHAVIOR. Tabletop, bimanual, mobile manipulation. Sim and real. Can’t wait to see the gradients flow from CaP-Gym to the next wave of frontier LLM releases. - CaP-Bench: we benchmark 12 frontier LLMs/VLMs (Gemini, GPT, Opus, Qwen, DeepSeek, Kimi, and more) across 8 evaluation tiers. We systematically vary API abstraction level, agentic harness, and visual grounding methods. Lots of insights in our paper. - CaP-Agent0: a training-free agentic harness that matches or exceeds human expert code on 4 out of 7 tasks without task-specific tuning. - CaP-RL: if you get a gym, you get RL ;). A 7B OSS model jumps from 20% to 72% success after only 50 training iterations. The synthesized programs transfer to real robots with minimal sim-to-real gap. 3 years ago, our team created Voyager, one of the earliest agentic AI that plays and learns in Minecraft continuously. Its key ideas — skill libraries, self-reflection loops, and in-context planning — have since influenced many modern agentic designs. Today, the agent graduates from Minecraft and gets a real job. It’s April Fool’s, but this Claw is getting its hands dirty for real! Link in thread:

译CaP-X开源具身智能系统，让大模型智能体通过机械臂与人形机器人进入物理世界。系统整合SAM3、Molmo等感知API与IK求解器、抓取规划等控制接口，可自动合成技能库。研究发布CaP-Gym基准（187项操作任务）与CaP-Bench（评测12个前沿模型），提出零样本框架CaP-Agent0及强化学习方案CaP-RL，后者仅用50次迭代即将7B模型成功率从20%提升至72%。该技术由曾开发Minecraft智能体Voyager的团队推出。

karminski-牙医@karminski3 · 3月26日

awesome 👍

译太棒了 👍

Deedy@deedydas · 3月24日

Siri has been broken for 13 years so I built my own. Completely on-device. No internet needed. Controls my Mac, sets reminders, fetches live data, answers questions. Built in a weekend. This is the future of software.

译吐槽 Siri 长期体验糟糕，作者花一个周末自研纯本地语音助手，无需联网即可控制 Mac、设置提醒、获取实时数据和回答问题，认为这是软件的未来方向。

Hao AI Lab@haoailab · 3月19日

Wow! The Vera Rubin demo looks great but real-time editing is actually already here on a single B200! Try Dreamverse today and generate 30s 1080p videos (with audio) faster than you can watch them. Demo: https://dreamverse.fastvideo.org/

译哇！Vera Rubin 的演示看起来很棒，但实时编辑实际上已经可以在单张 B200 上实现了！

Hao AI Lab@haoailab · 3月14日

(1/N) Content creators have been stuck with costly and slow video generation APIs for far too long. We couldn’t take it anymore.😅😭 FastVideo’s new real-time inference stack has the fastest 1080p TI2AV pipeline ever.😍🚀🚀 Our optimized LTX-2.3 pipeline creates 5-second 1080p videos with audio in 4.55 s, on a single GPU! 3.9x faster than the next fastest option. 🕹️Live demo: https://1080p.fastvideo.org/ 📜Blog: https://haoailab.com/blogs/fastvideo_realtime_1080p/

译(1/N) 内容创作者被困在昂贵且缓慢的视频生成 API 中太久了。我们再也受不了了。😅😭

Saining Xie@sainingxie · 2月27日

world modeling is never about rendering pixels. rendering is local. world state is global. as soon as more than one agent exists, the only thing that truly matters is the shared representation beneath individual views. that shared representation is what scales into collective capability. this is why I'm super excited to share project Solaris -- our new work focused on building a multiplayer video world model in minecraft. This release includes three main pieces. 1⃣Solaris Engine, a fully featured multiplayer data collection system with built in visuals. the team put a huge amount of work into this since nothing like it really exists yet. https://github.com/solaris-wm/solaris-engine 2⃣Solaris Model, a multiplayer DiT with a new memory efficient self forcing design, trained on 12.6M frames of coordinated Minecraft gameplay. https://github.com/solaris-wm/solaris 3⃣Solaris Eval, which uses a VLM as a judge to evaluate different multiplayer capabilities. read the full technical breakdown by @ojmichel4, and start building with Solaris. https://solaris-wm.github.io/

译Project Solaris提出世界建模的本质在于全局共享状态而非局部像素渲染，推出基于Minecraft的多人在线视频世界模型。该系统突破单智能体视角局限，支持任意数量智能体随时介入交互，实现持久化世界状态演化。核心包含三大组件：Solaris Engine多人数据收集系统、基于DiT架构的Solaris Model（采用新型内存高效自强制设计，训练于1260万帧协调游戏数据）、以及使用VLM评判的Solaris Eval评估体系。这一范式转变为构建神经MMORPG服务器奠定基础。

Jim Fan@DrJimFan · 1月31日

I still remember the excitement in 2023 when Stanford Smallville was launched. It was the largest multi-agent sim back then - yes, 25 bots felt like a lot. Today it's the "Bigville" moment. We are seeing a nascent, massive-scale alien civilization sim unfolding in real time: orders of magnitude more agents, way higher IQ, in-the-wild access to the internet, backed by the full arsenal of MCPs. What can possibly go wrong?

译我还记得2023年Stanford Smallville发布时的兴奋。那是当时最大的多智能体模拟——没错，25个bot感觉已经很多了。今天是"Bigville"时刻。我们正在看到一个新生的、大规模的外星文明模拟实时展开：数量级更多的agent、高得多的IQ、不受限制的互联网接入，由全套MCPs提供支持。能出什么问题呢？ [引用 @DrJimFan]：著名的Stanford Smallville正式开源！ 25个AI agent居住在一个数字版Westworld中，不知道自己生活在模拟里。他们上班、八卦、组织社交活动、结交新朋友，甚至坠入爱河。每个都有独特的个性和背景故事。 Smallville是2023年最鼓舞人心的AI agent实验之一。我们经常谈论单个LLM的涌现能力，但多智能体涌现在大规模下可能更加复杂和迷人。一个AI群体可以演绎整个文明的演化。前方有无限新的可能性。游戏将首先感受到影响。 Github: https://github.com/joonspk-research/generative_agents Paper: https://arxiv.org/abs/2304.03442 Authors: @joon_s_pk @joseph_c_obrien @carriejcai @merrierm @percyliang @msbernst

Hao AI Lab@haoailab · 8月28日49

[1/5] [Lmgame Bench] 🎮 Question: Can RL-based LLM post-training on games generalize to other tasks? We shared a preliminary study to explore this question: - Same-family (in-domain): Training on 6×6 Sokoban → 8×8 and Tetris (1 block type) → Tetris (2 block types) transfers, yielding up to 56% improvement across same-family variants. - Other tasks (out-of-domain): Blocksworld +3–7% and WebShop ~+6% (unstable); GSM8K: no improvement. We introduce GRL, an agent-centric multi-turn RL framework that makes LLM–environment interaction highly customizable for systematic generalization studies. Repo: https://github.com/lmgame-org/GRL Blog: https://lmgame.org/#/blog/grl (check it for details!)

译研究探讨了基于强化学习的LLM游戏后训练能否泛化到其他任务。在相同任务族内（如6×6推箱子泛化至8×8版本），训练带来了高达56%的性能提升。但在跨领域任务中，效果有限或不稳定：Blocksworld有小幅提升，WebShop有约6%但不稳定，GSM8K则无改善。研究团队为此提出了GRL框架，这是一个以智能体为中心的多轮强化学习框架，旨在高度定制LLM与环境的交互，以系统研究泛化能力。

Hao AI Lab@haoailab · 8月7日81

[Lmgame Bench] 🔥 OpenAI has just released two open‑weight reasoning models: gpt‑oss‑120B (~117 B) and gpt‑oss‑20B (~21 B),They are the first OpenAI models with open weights since GPT‑2. We tested both in Lmgame Bench, across 4 interactive games: 🧱 Sokoban | 🟦 Tetris | 🔢 2048 | 🍬 Candy Crush Here’s how they ranked (out of 25): → gpt‑oss‑120b → #12 → gpt‑oss‑20b → #13

译[Lmgame Bench] 🔥 OpenAI 刚刚发布了两款开放权重的推理模型：gpt-oss-120B（约1170亿参数）和 gpt-oss-20B（约210亿参数），它们是自 GPT-2 以来首批开放权重的 OpenAI 模型。我们在 Lmgame Bench 中对两者进行了测试，涵盖4款互动游戏： 🧱 推箱子 | 🟦 俄罗斯方块 | 🔢 2048 | 🍬 糖果传奇以下是它们的排名（满分25分）： → gpt-oss-120b → 第12名 → gpt-oss-20b → 第13名

Yann LeCun@ylecun · 7月19日

Hardware independent LLM inference engine from ZML.

译ZML 发布 LLMD 技术预览版，提供硬件无关的 LLM 推理方案。单容器同时支持 NVIDIA 与 AMD GPU，镜像仅 2.4GB，支持挂载即运行的高性能部署。

Saining Xie@sainingxie · 6月28日

metaquery is now open-source — with both the data and code available.

译metaquery 现已开源——数据和代码均已开放。

Yann LeCun@ylecun · 6月22日

Awesome new dataset from @SandboxAQ

译SandboxAQ 开源 SAIR 数据集，包含超500万个蛋白质-配体3D结构及结合亲和力标注，为目前最大规模开源结合亲和力数据集。基于NVIDIA DGX Cloud构建，现已在Google Cloud公开可用，旨在为药物发现AI模型提供训练与评估数据。

Yann LeCun@ylecun · 6月20日

Awesome new dataset from @SandboxAQ

译SandboxAQ 发布开源数据集 SAIR（Structurally Augmented IC50 Repository），收录逾 500 万个共折叠蛋白质-配体 3D 结构及结合亲和力数据，为目前规模最大的开源结合亲和力数据集。数据由大型定量模型生成，旨在为药物发现 AI 模型提供高质量训练数据，弥合分子结构与药效预测间的鸿沟。该数据集基于 NVIDIA DGX Cloud 构建，现已在 Google Cloud Platform 公开发布，供全球研究人员下载使用。

Saining Xie@sainingxie · 4月24日

Recently open-sourced projects from @TongPetersb, @DavidJFan, and the team at Meta FAIR. MetaMorph (training code and model weights): https://github.com/facebookresearch/metamorph/ Web-SSL (model weights for Web-DINO and Web-MAE) https://github.com/facebookresearch/webssl FAIR's still leading the way in open research.

译最近由 @TongPetersb、@DavidJFan 和 Meta FAIR 团队开源的项目。

DeepSeek@deepseek_ai · 2月28日

🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster ⚡ 40+ GiB/s peak throughput per client node for KVCache lookup 🧬 Disaggregated architecture with strong consistency semantics ✅ Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1 📥 3FS → https://github.com/deepseek-ai/3FS ⛲ Smallpond - data processing framework on 3FS → https://github.com/deepseek-ai/smallpond

译DeepSeek发布开源并行文件系统3FS（Fire-Flyer File System），专为现代SSD和RDMA网络优化。180节点集群可实现6.6 TiB/s聚合读取吞吐量，25节点GraySort测试达3.66 TiB/min，单节点KVCache查找峰值超40 GiB/s。采用分离式架构与强一致性语义，支持训练数据预处理、检查点存取及V3/R1推理的KVCache查找。同步开源Smallpond数据处理框架。