Show HN： Needle：我们将"双子座工具召唤"浓缩为一个26M模型

2026-05-13 04:37·51天前·HenryNdubuaku

AI 摘要

研究团队发布了名为Needle的轻量级模型，它将谷歌Gemini的工具调用能力浓缩至仅2600万参数。该模型在保持核心功能的同时，体积显著缩小，旨在实现更高效的部署与应用。项目代码已在GitHub开源，并在Hacker News社区获得了超过100点的关注度。

原文 · 未翻译

Needle

We distilled Gemini 3.1 into a 26m parameter "Simple Attention Network" that you can even finetune locally on your Mac/PC. In production, Needle runs on Cactus at 6000 toks/sec prefill and 1200 decode speed. Weights are fully open on Cactus-Compute/needle, as well as the dataset generation.

d=512, 8H/4KV, BPE=8192 ┌──────────────┐ │ Tool Call │ └──────┬───────┘ ┌┴──────────┐ │ Softmax │ └─────┬─────┘ ┌─────┴─────┐ │ Linear (T)│ ← tied └─────┬─────┘ ┌─────┴─────┐ │ ZCRMSNorm │ └─────┬─────┘ ┌────────┴────────┐ │ Decoder x 8 │ │┌───────────────┐│ ││ ZCRMSNorm ││ ││ Masked Self ││ ││ Attn + RoPE ││ ││ Gated Residual││ │├───────────────┤│ ┌──────────────┐ ││ ZCRMSNorm ││ │ Encoder x 12 │──────────────────────▶Cross Attn ││ │ │ ││ Gated Residual││ │ ┌──────────┐ │ │└───────────────┘│ │ │ZCRMSNorm │ │ └────────┬────────┘ │ │Self Attn │ │ ┌─────┴─────┐ │ │ GQA+RoPE │ │ │ Embedding │ ← shared │ │Gated Res │ │ └─────┬─────┘ │ │ │ │ ┌───────┴───────-┐ │ │ (no FFN) │ │ │[EOS]│ │ └──────────┘ │ │ + answer │ │ │ └───────────────-┘ └──────┬───────┘ │ ┌────┴──────┐ │ Embedding │ └────┬──────┘ │ ┌────┴──────┐ │ Text │ │ query │ └───────────┘

Pretrained on 16 TPU v6e for 200B tokens (27hrs).

Post-trained on 2B tokens of single-shot function call dataset (45mins).

Needle is an experimental run for Simple Attention Networks, geared at redefining tiny AI for consumer devices (phones, watches, glasses...). So while it beats FunctionGemma-270m, Qwen-0.6B, Graninte-350m, LFM2.5-350m on single-shot function call for personal AI, Those model are have more scope/capacity and excel in conversational settings. Also, small models can be finicky. Please use the UI in the next section to test on your own tools, and finetune accordingly, at the click of a button.

Quickstart

git clone https://github.com/cactus-compute/needle.git cd needle && source ./setup needle playground

Opens a web UI at http://127.0.0.1:7860 where you can test and finetune on your own tools. Weights are auto-downloaded.

Hacker News 热门（buzzing.cc 中文翻译）

65导出 Markdown

Show HN： Needle：我们将"双子座工具召唤"浓缩为一个26M模型

2026-05-13 04:37·51天前·HenryNdubuaku

阅读原文· github.com

AI 摘要

原文 · 保持原样，未翻译

Needle

Show HN： Needle：我们将"双子座工具召唤"浓缩为一个26M模型

Show HN： Needle：我们将"双子座工具召唤"浓缩为一个26M模型

Playground (generates data via Gemini, trains, evaluates, bundles result) needle playground # CLI (auto-downloads weights if not local) needle finetune data.jsonl

Playground (generates data via Gemini, trains, evaluates, bundles result) needle playground # CLI (auto-downloads weights if not local) needle finetune data.jsonl