Gemma4可通过推测性解码实现23%推理加速。实测RTX5090上,31B dense主模型搭配E2B(5.1B)草稿模型,速度从61 token/s提升至76 token/s。该技术利用大模型算力过剩而显存带宽不足的特性,由小模型快速生成候选序列,大模型通过prefill阶段批量验证,避免逐token解码的带宽瓶颈。注意需保持模型系列一致性,Gemma4应搭配同系列草稿模型,不可与Qwen3.5混用。
Gemma4可通过推测性解码实现23%推理加速。实测RTX5090上,31B dense主模型搭配E2B(5.1B)草稿模型,速度从61 token/s提升至76 token/s。该技术利用大模型算力过剩而显存带宽不足的特性,由小模型快速生成候选序列,大模型通过prefill阶段批量验证,避免逐token解码的带宽瓶颈。注意需保持模型系列一致性,Gemma4应搭配同系列草稿模型,不可与Qwen3.5混用。
Chrome DevTools MCP新增多项面向AI Agent的调试技能,支持通过Lighthouse执行性能审计、检测内存泄漏、无障碍调试及LCP优化。这些功能旨在为AI Agent提供自动化代码质量检查能力,帮助识别性能瓶颈与可访问性问题。同时推出实验性CLI工具,支持命令行调用各项调试能力。
Want to give your agent quality checks? Chrome's DevTools MCP now includes: ⚡️ Performance checks via Lighthouse 📈 Memo...
@testingcatalog You can also add custom summary and cover images to @NotebookLM now
Longer tracks are here with Lyria 3 Pro in Gemini! From experimenting with different styles to generating tracks with co...
Gemini can now transform your questions and complex concepts into customizable interactive visualizations directly in yo...
Lots of love for Gemma 4! Team just told me it's already had 10M+ downloads since last week's launch. Gemma models have ...
This sounds harsh but it is true, very few of the guests we have on 20VC will be remembered in history for truly progres...
Artificial Analysis 发布 APEX-Agents-AA 排行榜,基于 Mercor 的 APEX-Agents 基准评估 AI 代理在长周期专业任务(投资银行、管理咨询、公司法)的表现。测试通过 Stirrup 框架和 MCP 工具执行 452 个任务,涵盖消息回复、文档处理等。结果显示 GPT-5.4 以 33.3% 领先,Claude Opus 4.6 (33.0%) 和 Gemini 3.1 Pro Preview (32%) 紧随其后,三强竞争激烈。评分采用 LLM 评判和 pass@1 标准。
What is the real future Google DeepMind CEO @demishassabis is trying to build? That's what we talk about in this HUGE* C...
AI Engineer Europe Build Day公布六大技术分论坛,聚焦AI工程前沿实践。议程涵盖Personal Agent(Claw)个人代理、Context Engineering长上下文管理、Harness Engineering代理性能优化、Evals & Observability评估体系、Voice & Vision语音视觉多模态,以及Gemini专场。从OpenClaw到Google DeepMind,内容涉及RAG、TTS、ASR、WebMCP等技术方向,呈现AI工程从提示词向复杂代理系统演进的最新趋势。
just went live on european TBPN! exclusive preview of the @aiDotEngineer Europe Build Day today
Tomorrow on Cheeky Pint: @sundarpichai gets into everything AI with @eladgil and me.
We've signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online...
推文以原子弹研发为例,阐述极端泛化的本质:科学仅用47年、约9个关键实验便实现从放射性观察到核武器的突破。这种进步不依赖大数据,而源于符号压缩——将少量刻意收集的数据点提炼为单页纸可承载的因果符号规则。核心观点在于,通过逆向推导数据背后的因果逻辑,人类能够将极简信息转化为重塑现实的完整方案,展现符号推理在突破认知边界中的决定性作用。
Here is a quick start script including the setup, technical details, and a candid look at where Kinetic excels versus it...
🇬🇧 London is the birthplace of @GoogleDeepMind, and we're so honored to have them back as: Presenting Sponsors of this...
Mintlify assistant is powered by just-bash with a custom filesystem
Fine-Tuning Gemma 2B on PubMedQA: Building a Medical Q&A Assistant with LoRA, Keras Kinetic, and Cloud TPU https://kuanh...
Gemma 4 and what makes an open model succeed Hint: it's not benchmark scores. https://www.interconnects.ai/p/gemma-4-and...
Keras 社区发布 Kinetic 库,开发者通过装饰器即可将函数部署至云端 TPU/GPU 运行,定位类似 Modal 但新增 TPU 支持。该工具自动完成代码打包、Cloud Build 容器构建(支持缓存)、GKE 集群调度及结果返回,实现日志实时流式传输,使远程执行体验如同本地运行。
The Keras team is doing a community call today at 10am PT. That's in 25 min. The call is open to all -- join to learn ab...
Google发布的Gemma4系列开放权重模型包含多个版本,选型需结合场景。带“-it”后缀为指令微调版,开箱即用;不带后缀为基座模型,供自行微调。其中,A4B指激活参数量为4B,E4B则采用逐层嵌入技术,以内存换取计算量,优化移动端性能。选型建议:综合性能与速度选26B-A4B;追求最佳代码或任务效果选31B;开发本地全模态应用选E4B;资源受限设备体验可选E2B,但输出质量有限。
Google DeepMind推出Gemma 4系列四款多模态开源模型,支持文本、图像及视频输入。31B(密集架构)与26B A4B(MoE架构)拥有256k上下文窗口,可在单张H100运行;另两款较小模型支持128k上下文。GPQA Diamond测试中,Gemma 4 31B(Reasoning)获85.7%,仅次于Qwen3.5 27B,但输出token仅约1.2M,效率更优;26B A4B(Reasoning)得分79.2%,超越gpt-oss-120B。
关联讨论 2 条X:Artificial Analysis (@ArtificialAnlys)X:Jeff Dean (@JeffDean)Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can b...