We distilled Gemini 3.1 into a 26m parameter "Simple Attention Network" that you can even finetune locally on your Mac/PC. In production, Needle runs on Cactus at 6000 toks/sec prefill and 1200 decode speed. Weights are fully open on Cactus-Compute/needle, as well as the dataset generation.
Post-trained on 2B tokens of single-shot function call dataset (45mins).
Needle is an experimental run for Simple Attention Networks, geared at redefining tiny AI for consumer devices (phones, watches, glasses...). So while it beats FunctionGemma-270m, Qwen-0.6B, Graninte-350m, LFM2.5-350m on single-shot function call for personal AI, Those model are have more scope/capacity and excel in conversational settings. Also, small models can be finicky. Please use the UI in the next section to test on your own tools, and finetune accordingly, at the click of a button.
Quickstart
git clone https://github.com/cactus-compute/needle.git cd needle && source ./setup needle playground
Opens a web UI at http://127.0.0.1:7860 where you can test and finetune on your own tools. Weights are auto-downloaded.
We distilled Gemini 3.1 into a 26m parameter "Simple Attention Network" that you can even finetune locally on your Mac/PC. In production, Needle runs on Cactus at 6000 toks/sec prefill and 1200 decode speed. Weights are fully open on Cactus-Compute/needle, as well as the dataset generation.
from needle import SimpleAttentionNetwork, load_checkpoint, generate, get_tokenizer params, config = load_checkpoint("checkpoints/needle.pkl") model = SimpleAttentionNetwork(config) tokenizer = get_tokenizer() result = generate( model, params, tokenizer, query="What's the weather in San Francisco?", tools='[{"name":"get_weather","description":"Get current weather for a city.","parameters":{"location":{"type":"string","description":"City name.","required":true}}}]', stream=False, ) print(result) # [{"name":"get_weather","arguments":{"location":"San Francisco"}}]
Finetuning
Playground (generates data via Gemini, trains, evaluates, bundles result) needle playground # CLI (auto-downloads weights if not local) needle finetune data.jsonl
Data format
Each line in the JSONL file has three fields: query, tools, and answers.
query
tools
answers
Tool schema:
{ "name": "get_weather", "description": "Get current weather for a city.", "parameters": { "location": { "type": "string", "description": "City name.", "required": true } } }
Full JSONL example (each line is one training example, tools and answers are JSON-encoded strings):
tools
answers
{"query": "What's the weather in Paris?", "tools": "[{\"name\":\"get_weather\",\"description\":\"Get current weather for a city.\",\"parameters\":{\"location\":{\"type\":\"string\",\"description\":\"City name.\",\"required\":true}}}]", "answers": "[{\"name\":\"get_weather\",\"arguments\":{\"location\":\"Paris\"}}]"} {"query": "Turn off the lights", "tools": "[{\"name\":\"get_weather\",\"description\":\"Get current weather for a city.\",\"parameters\":{\"location\":{\"type\":\"string\",\"description\":\"City name.\",\"required\":true}}},{\"name\":\"toggle_lights\",\"description\":\"Toggle smart lights on or off.\",\"parameters\":{\"state\":{\"type\":\"string\",\"description\":\"on or off.\",\"required\":true}}}]", "answers": "[{\"name\":\"toggle_lights\",\"arguments\":{\"state\":\"off\"}}]"}
Provide at least 120 examples per tool (100 train / 10 val / 10 test). Fewer examples will overfit — you'll see perfect training metrics but the model won't generalize. Vary query phrasing and include examples with multiple tools available.
Using a finetuned model
Finetuning saves the best checkpoint as checkpoints/needle_finetuned__best.pkl:
checkpoints/needle_finetuned__best.pkl
needle run --checkpoint checkpoints/needle_finetuned_*_best.pkl \ --query "What's the weather?" --tools '[{"name":"get_weather","description":"Get current weather for a city.","parameters":{"location":{"type":"string","description":"City name.","required":true}}}]'
params, config = load_checkpoint("checkpoints/needle_finetuned__best.pkl") model = SimpleAttentionNetwork(config) result = generate(model, params, get_tokenizer(), query="...", tools='[...]', stream=False)
CLI
needle playground Test and finetune via web UI needle finetune Finetune on your own data needle run --query "..." --tools Single inference needle train Full training run needle pretrain Pretrain on PleIAs/SYNTH needle eval --checkpoint Evaluate a checkpoint needle tokenize Tokenize dataset needle generate-data Synthesize training data via Gemini needle tpu TPU management (see docs/tpu.md)
needle playground Test and finetune via web UI needle finetune Finetune on your own data needle run --query "..." --tools Single inference needle train Full training run needle pretrain Pretrain on PleIAs/SYNTH needle eval --checkpoint Evaluate a checkpoint needle tokenize Tokenize dataset needle generate-data Synthesize training data via Gemini needle tpu TPU management (see docs/tpu.md)
@misc{ndubuaku2026needle, title={Needle}, author={Henry Ndubuaku, Jakub Mroz, Karen Mosoyan, Roman Shemet, Parkirat Sandhu, Satyajit Kumar, Noah Cylich, Justin H. Lee}, year={2026}, url={https://github.com/cactus-compute/needle} }
@misc{ndubuaku2026needle, title={Needle}, author={Henry Ndubuaku, Jakub Mroz, Karen Mosoyan, Roman Shemet, Parkirat Sandhu, Satyajit Kumar, Noah Cylich, Justin H. Lee}, year={2026}, url={https://github.com/cactus-compute/needle} }
Post-trained on 2B tokens of single-shot function call dataset (45mins).
Needle is an experimental run for Simple Attention Networks, geared at redefining tiny AI for consumer devices (phones, watches, glasses...). So while it beats FunctionGemma-270m, Qwen-0.6B, Graninte-350m, LFM2.5-350m on single-shot function call for personal AI, Those model are have more scope/capacity and excel in conversational settings. Also, small models can be finicky. Please use the UI in the next section to test on your own tools, and finetune accordingly, at the click of a button.
Quickstart
git clone https://github.com/cactus-compute/needle.git cd needle && source ./setup needle playground
Opens a web UI at http://127.0.0.1:7860 where you can test and finetune on your own tools. Weights are auto-downloaded.
Usage (Python)
from needle import SimpleAttentionNetwork, load_checkpoint, generate, get_tokenizer params, config = load_checkpoint("checkpoints/needle.pkl") model = SimpleAttentionNetwork(config) tokenizer = get_tokenizer() result = generate( model, params, tokenizer, query="What's the weather in San Francisco?", tools='[{"name":"get_weather","description":"Get current weather for a city.","parameters":{"location":{"type":"string","description":"City name.","required":true}}}]', stream=False, ) print(result) # [{"name":"get_weather","arguments":{"location":"San Francisco"}}]
Finetuning
Playground (generates data via Gemini, trains, evaluates, bundles result) needle playground # CLI (auto-downloads weights if not local) needle finetune data.jsonl
Data format
Each line in the JSONL file has three fields: query, tools, and answers.
query
tools
answers
Tool schema:
{ "name": "get_weather", "description": "Get current weather for a city.", "parameters": { "location": { "type": "string", "description": "City name.", "required": true } } }
Full JSONL example (each line is one training example, tools and answers are JSON-encoded strings):
tools
answers
{"query": "What's the weather in Paris?", "tools": "[{\"name\":\"get_weather\",\"description\":\"Get current weather for a city.\",\"parameters\":{\"location\":{\"type\":\"string\",\"description\":\"City name.\",\"required\":true}}}]", "answers": "[{\"name\":\"get_weather\",\"arguments\":{\"location\":\"Paris\"}}]"} {"query": "Turn off the lights", "tools": "[{\"name\":\"get_weather\",\"description\":\"Get current weather for a city.\",\"parameters\":{\"location\":{\"type\":\"string\",\"description\":\"City name.\",\"required\":true}}},{\"name\":\"toggle_lights\",\"description\":\"Toggle smart lights on or off.\",\"parameters\":{\"state\":{\"type\":\"string\",\"description\":\"on or off.\",\"required\":true}}}]", "answers": "[{\"name\":\"toggle_lights\",\"arguments\":{\"state\":\"off\"}}]"}
Provide at least 120 examples per tool (100 train / 10 val / 10 test). Fewer examples will overfit — you'll see perfect training metrics but the model won't generalize. Vary query phrasing and include examples with multiple tools available.
Using a finetuned model
Finetuning saves the best checkpoint as checkpoints/needle_finetuned__best.pkl:
checkpoints/needle_finetuned__best.pkl
needle run --checkpoint checkpoints/needle_finetuned_*_best.pkl \ --query "What's the weather?" --tools '[{"name":"get_weather","description":"Get current weather for a city.","parameters":{"location":{"type":"string","description":"City name.","required":true}}}]'
params, config = load_checkpoint("checkpoints/needle_finetuned__best.pkl") model = SimpleAttentionNetwork(config) result = generate(model, params, get_tokenizer(), query="...", tools='[...]', stream=False)
CLI
needle playground Test and finetune via web UI needle finetune Finetune on your own data needle run --query "..." --tools Single inference needle train Full training run needle pretrain Pretrain on PleIAs/SYNTH needle eval --checkpoint Evaluate a checkpoint needle tokenize Tokenize dataset needle generate-data Synthesize training data via Gemini needle tpu TPU management (see docs/tpu.md)
needle playground Test and finetune via web UI needle finetune Finetune on your own data needle run --query "..." --tools Single inference needle train Full training run needle pretrain Pretrain on PleIAs/SYNTH needle eval --checkpoint Evaluate a checkpoint needle tokenize Tokenize dataset needle generate-data Synthesize training data via Gemini needle tpu TPU management (see docs/tpu.md)
@misc{ndubuaku2026needle, title={Needle}, author={Henry Ndubuaku, Jakub Mroz, Karen Mosoyan, Roman Shemet, Parkirat Sandhu, Satyajit Kumar, Noah Cylich, Justin H. Lee}, year={2026}, url={https://github.com/cactus-compute/needle} }
@misc{ndubuaku2026needle, title={Needle}, author={Henry Ndubuaku, Jakub Mroz, Karen Mosoyan, Roman Shemet, Parkirat Sandhu, Satyajit Kumar, Noah Cylich, Justin H. Lee}, year={2026}, url={https://github.com/cactus-compute/needle} }