# Show HN： 我开发的 OSS Agent 在 Gemini-3-flash-preview 上夺得 TerminalBench 榜首

- 来源：Hacker News 热门（buzzing.cc 中文翻译）
- 作者：GodelNumbering
- 发布时间：2026-04-27 23:03
- AIHOT 分数：64
- AIHOT 链接：https://aihot.virxact.com/items/cmohcnxxq00hhslmc8a5bajut
- 原文链接：https://github.com/dirac-run/dirac

## AI 摘要

一款名为OSS Agent的开源智能体在谷歌Gemini-3-flash-preview模型上运行，成功登顶终端操作基准测试TerminalBench榜首。该智能体由开发者独立构建，其GitHub仓库地址已公开。这一成果在技术社区Hacker News上获得了113个点赞，引发了广泛关注。

## 正文

Dirac - Accurate & Highly Token Efficient Open Source AI Agent

Dirac topped the Terminal-Bench-2 leaderboard for gemini-3-flash-preview with a 65.2% score!

Dirac topped the Terminal-Bench-2 leaderboard for gemini-3-flash-preview with a 65.2% score!

gemini-3-flash-preview

It is a well studied phenomenon that any given model's reasoning ability degrades with the context length. If we can keep context tightly curated, we improve both accuracy and cost while making larger changes tractable in a single task.

Dirac is an open-source coding agent built with this in mind. It reduces API costs by 64.8% on average while producing better and faster work. Using hash-anchored parallel edits, AST manipulation, and a suite of advanced optimizations. Oh, and no MCP.

Our goal: Optimize for bang-for-the-buck on tooling with bare minimum prompting instead of going blindly minimalistic.

📊 Evals

Dirac is benchmarked against other leading open-source agents on complex, real-world refactoring tasks. Dirac consistently achieves 100% accuracy at a fraction of the cost. These evals are run on public github repos and should be reproducible by anyone.

🏆 TerminalBench 2.0 Leaderboard: Dirac recently topped the Terminal-Bench-2 leaderboard with a 65.2% score using gemini-3-flash-preview. This outperforms both Google's official baseline (47.6%) and the top closed-source agent Junie CLI (64.3%). This was achieved without any benchmark-specific info or any AGENTS.md files being inserted.

🏆 TerminalBench 2.0 Leaderboard: Dirac recently topped the Terminal-Bench-2 leaderboard with a 65.2% score using gemini-3-flash-preview. This outperforms both Google's official baseline (47.6%) and the top closed-source agent Junie CLI (64.3%). This was achieved without any benchmark-specific info or any AGENTS.md files being inserted.

gemini-3-flash-preview

AGENTS.md

Note on the cost table below: A bug was discovered in Cline, the parent repo, after running these evals (issue #10314). We have submitted a PR #10315 to fix this. This bug caused the evals for Dirac and Cline to slightly underreport the numbers ($0.03 vs $0.05 per million token cache read). Although there won't be a large difference, we will update the evals soon.

Note on the cost table below: A bug was discovered in Cline, the parent repo, after running these evals (issue #10314). We have submitted a PR #10315 to fix this. This bug caused the evals for Dirac and Cline to slightly underreport the numbers ($0.03 vs $0.05 per million token cache read). Although there won't be a large difference, we will update the evals soon.

All tasks for all models used gemini-3-flash-preview with thinking set to high

gemini-3-flash-preview

high

Task (Repo) Files* Cline Kilo Ohmypi Opencode Pimono Roo Dirac Task1 (transformers) 8 🟢 (diff) [$0.37] 🔴 (diff) [N/A] 🟡 (diff) [$0.24] 🟢 (diff) [$0.20] 🟢 (diff) [$0.34] 🟢 (diff) [$0.49] 🟢 (diff) [$0.13] Task2 (vscode) 21 🟢 (diff) [$0.67] 🟡 (diff) [$0.78] 🟢 (diff) [$0.63] 🟢 (diff) [$0.40] 🟢 (diff) [$0.48] 🟡 (diff) [$0.58] 🟢 (diff) [$0.23] Task3 (vscode) 12 🟡 (diff) [$0.42] 🟢 (diff) [$0.70] 🟢 (diff) [$0.64] 🟢 (diff) [$0.32] 🟢 (diff) [$0.25] 🟡 (diff) [$0.45] 🟢 (diff) [$0.16] Task4 (django) 14 🟢 (diff) [$0.36] 🟢 (diff) [$0.42] 🟡 (diff) [$0.32] 🟢 (diff) [$0.24] 🟡 (diff) [$0.24] 🟢 (diff) [$0.17] 🟢 (diff) [$0.08] Task5 (vscode) 3 🔴 (diff) [N/A] 🟢 (diff) [$0.71] 🟢 (diff) [$0.43] 🟢 (diff) [$0.53] 🟢 (diff) [$0.50] 🟢 (diff) [$0.36] 🟢 (diff) [$0.17] Task6 (transformers) 25 🟢 (diff) [$0.87] 🟡 (diff) [$1.51] 🟢 (diff) [$0.94] 🟢 (diff) [$0.90] 🟢 (diff) [$0.52] 🟢 (diff) [$1.44] 🟢 (diff) [$0.34] Task7 (vscode) 13 🟡 (diff) [$0.51] 🟢 (diff) [$0.77] 🟢 (diff) [$0.74] 🟢 (diff) [$0.67] 🟡 (diff) [$0.45] 🟢 (diff) [$1.05] 🟢 (diff) [$0.25] Task8 (transformers) 3 🟢 (diff) [$0.25] 🟢 (diff) [$0.19] 🟢 (diff) [$0.17] 🟢 (diff) [$0.26] 🟢 (diff) [$0.23] 🟢 (diff) [$0.29] 🟢 (diff) [$0.12] Total Correct 5/8 5/8 6/8 8/8 6/8 6/8 8/8 Avg Cost $0.49 $0.73 $0.51 $0.44 $0.38 $0.60 $0.18

🟢 Success | 🟡 Incomplete | 🔴 Failure

🟢 Success | 🟡 Incomplete | 🔴 Failure

Cost Comparison: Dirac is 64.8% cheaper than the competition (a 2.8x cost reduction). * Expected number of files to be modified/created to complete the task. See evals/README.md for detailed task descriptions and methodology.

Cost Comparison: Dirac is 64.8% cheaper than the competition (a 2.8x cost reduction).

* Expected number of files to be modified/created to complete the task.

See evals/README.md for detailed task descriptions and methodology.

🚀 Key Features

Hash-Anchored Edits: Dirac uses stable line hashes to target edits with extreme precision, avoiding the "lost in translation" issues of traditional line-number based editing.

AST-Native Precision: Built-in understanding of language syntax (TypeScript, Python, C++, etc.) allows Dirac to perform structural manipulations like function extraction or class refactoring with 100% accuracy.

Multi-File Batching: Dirac can process and edit multiple files in a single LLM roundtrip, significantly reducing latency and API costs.

High-Bandwidth Context: Optimized context curation keeps the agent lean and fast, ensuring the LLM always has the most relevant information without wasting tokens.

Autonomous Tool Use: Dirac can read/write files, execute terminal commands, use a headless browser, and more - all while keeping you in control with an approval-based workflow.

Skills & AGENTS.md: Customize Dirac's behavior with project-specific instructions using AGENTS.md files. It also seamlessly picks up Claude's skills by automatically reading from .ai, .claude, and .agents directories.

AGENTS.md

.ai

.claude

.agents

Native Tool Calling Only: To ensure maximum reliability and performance, Dirac exclusively supports models with native tool calling enabled. (Note: MCP is not supported).

📦 Installation

VS Code Extension

Install Dirac from the VS Code Marketplace.

CLI (Terminal)

Install the Dirac CLI globally using npm:

npm install -g dirac-cli

Note: Node.js v25 is currently not supported due to an upstream V8 Turboshaft compiler bug that causes out-of-memory crashes during WASM initialization. Please use Node.js v20, v22, or v24 (LTS versions).

Note: Node.js v25 is currently not supported due to an upstream V8 Turboshaft compiler bug that causes out-of-memory crashes during WASM initialization. Please use Node.js v20, v22, or v24 (LTS versions).

🚀 CLI Quick Start

Authenticate: dirac auth

dirac auth

Run your first task: dirac "Analyze the architecture of this project"

dirac "Analyze the architecture of this project"

Configuration (Environment Variables)

You can provide API keys via environment variables to skip the dirac auth step. This is ideal for CI/CD or non-persistent environments.

dirac auth

For provider-specific setup (e.g. AWS Bedrock, Google Cloud Vertex AI), see the Provider Settings guide.

Common API Keys:

ANTHROPIC_API_KEY

ANTHROPIC_API_KEY

OPENAI_API_KEY

OPENAI_API_KEY

OPENROUTER_API_KEY

OPENROUTER_API_KEY

GEMINI_API_KEY

GEMINI_API_KEY

GROQ_API_KEY

GROQ_API_KEY

MISTRAL_API_KEY

MISTRAL_API_KEY

XAI_API_KEY (x.ai)

XAI_API_KEY

HF_TOKEN (HuggingFace)

HF_TOKEN

... and others (see src/shared/storage/env-config.ts for the full list).

src/shared/storage/env-config.ts

You can use any OpenAI-compatible provider (e.g., DeepSeek, DeepInfra, OpenRouter, or your own local proxy) by providing the base URL and model ID.

Environment Variables:

OPENAI_API_BASE: Your API base URL (e.g., https://api.deepseek.com/v1).

OPENAI_API_BASE

https://api.deepseek.com/v1

OPENAI_API_KEY (or OPENAI_COMPATIBLE_CUSTOM_KEY): Your API key.

OPENAI_API_KEY

OPENAI_COMPATIBLE_CUSTOM_KEY

CUSTOM_HEADERS: Optional custom headers (e.g., "Authorization=Bearer token,X-Account-Id=123" or JSON format).

CUSTOM_HEADERS

"Authorization=Bearer token,X-Account-Id=123"

CLI Example:

# Using environment variables export OPENAI_API_BASE="https://api.yourprovider.com/v1" export OPENAI_API_KEY="your-api-key" export CUSTOM_HEADERS="Authorization=Bearer XXX" dirac "explain Dirac Delta function" \ # --provider is now optional if OPENAI_API_BASE is set --model "your-model-id"

CLI Flag Example:

dirac "explain Dirac Delta function" \ --provider "https://api.deepseek.com/v1" \ --model "deepseek-v4-pro" \ --headers "X-Custom-Header=Value"

Common Commands

dirac "prompt": Start an interactive task.

dirac "prompt"

dirac -p "prompt": Run in Plan Mode to see the strategy before executing.

dirac -p "prompt"

dirac -y "prompt": Yolo Mode (auto-approve all actions, great for simple fixes).

dirac -y "prompt"

git diff | dirac "Review these changes": Pipe context directly into Dirac.

git diff | dirac "Review these changes"

dirac history: View and resume previous tasks.

dirac history

🛠️ Getting Started

Open the Dirac sidebar in VS Code.

Configure your preferred AI provider (Anthropic, OpenAI, OpenRouter, etc.).

Start a new task by describing what you want to build or fix.

Watch Dirac go!

🛠️ Development

Setup

npm run install:all

Protobufs (required before build)

npm run protos

Build

npm run compile

Lint

npm run lint

Running Tests

Unit tests require the TS_NODE_PROJECT environment variable set to ./tsconfig.unit-test.json. This is because VS Code's test runner requires CommonJS modules while the main project uses ESM.

TS_NODE_PROJECT

./tsconfig.unit-test.json

# Run all tests (unit + integration) npm test # Run only unit tests npm run test:unit

The test:unit script already sets TS_NODE_PROJECT=./tsconfig.unit-test.json automatically. If you need to run mocha directly, set it manually:

test:unit

TS_NODE_PROJECT=./tsconfig.unit-test.json

TS_NODE_PROJECT=./tsconfig.unit-test.json npx mocha "src/**/__tests__/*.ts" "src/**/*.test.ts"

📈 Star History

📄 License

Dirac is open source and licensed under the Apache License 2.0.

🤝 Acknowledgments

Dirac is a fork of the excellent Cline project. We are grateful to the Cline team and contributors for their foundational work.

Built with ❤️ by Max Trivedi at Dirac Delta Labs
