Rohan Paul@rohanpaul_ai

2026-05-17 21:53·46天前

AI 摘要

设备端小模型拥有如此多的可能性。这里 @adrgrondin 正在 iPhone 17 Pro 上运行 Google 的 Gemma 4 E2B。针对 Apple Silicon 优化的 MLX 实现约 40tk/s 的速度在移动端实现 SOTA 编程与数学能力，支持 128K 上下文。完全离线运行并具备思考模式。

So much possibilities for on-device small models.

Here @adrgrondin is running Google's Gemma 4 E2B on iPhone 17 Pro. ~40tk/s with MLX optimized for Apple Silicon SOTA coding &amp； math on mobile with 128K context. Fully offline with thinking mode.

Google 推理端侧行业动态

在 X 查看原推导出 Markdown

Rohan Paul@rohanpaul_ai · X

61导出 Markdown

2026-05-17 21:53·46天前

在 X 看原推· x.com

AI 摘要

So much possibilities for on-device small models.

Here @adrgrondin is running Google's Gemma 4 E2B on iPhone 17 Pro. ~40tk/s with MLX optimized for Apple Silicon SOTA coding &amp； math on mobile with 128K context. Fully offline with thinking mode.

Google 推理端侧行业动态