elvis@omarsar0

2026-05-01 08:30·63天前

AI 摘要

测试者使用 DeepSeek-V4-Pro 在 Pi 编码智能体上构建了一个 LLM 知识库，对其开箱即用的表现感到震撼。这是首个在推理能力上媲美 Claude 和 Codex 的开源权重模型，且成本效益高，支持 100 万上下文长度。该模型无需复杂配置即可在基础框架中直接运行，擅长智能体编码和知识密集型推理任务，能跨公司文档、论坛、论文和代码库进行多步骤研究、代码生成与上下文推理。其高效运行得益于 Fireworks 的市场最快推理速度及混合注意力设计，将 KV 缓存降至 10%，推理计算量减少近 4 倍，实现了快速且低成本的实践部署。

I have been testing DeepSeek-V4-Pro with the Pi coding agent.

I am mindblown by how well it works out of the box.

A few notes：

I spent a few hours building an LLM wiki with an agent powered entirely by DeepSeek-V4-Pro on @FireworksAI_HQ inference.

This is the first time I feel like there is an open-weight model that can reason at the level of Claude and Codex. And it does this in a cost-effective way with support for 1M context length.

To be clear， I am using DeepSeek-V4-Pro inside of Pi without any special configuration. It works out of the box. It's exciting that there is a model that can just be plugged into a basic harness like Pi， and it just works. I've never seen that before. Most models require lots of configuration and setup.

@deepseek_ai's DeepSeek-V4-Pro is clearly good at agentic coding （probably the best from the open-weight models）， but the model is also great on knowledge-intensive tasks where reasoning matters. The agent pulled agentic engineering best practices from different company docs （Anthropic， OpenAI， Google， Stripe， Meta， Modal， DeepSeek， Mistral， Cohere）， searched and digested Reddit and HN threads， summarized arxiv papers， and surfaced trending GitHub repos. Then it distilled everything into actionable tips across categories. I love the Wiki it built. The quality is really good. Here is a snapshot of what the wiki looks like： https://github.com/dair-ai/dair-workshops/tree/main/agentic-engineering-wiki

DeepSeek-V4-Pro handled the task without breaking stride. Multi-step research queries， code generation for scaffolding， context-heavy reasoning across disparate sources. For coding specifically， this is the first open-weight model that genuinely feels like a Codex or Claude Code experience. It compares in capability and actual multi-turn agentic work.

What made the loop feel so responsive was Fireworks' inference speed （the fastest in the market） and the fact that they actually validate models at the systems level before shipping. No corrupted reasoning traces. Just fast， reliable iteration. The hybrid CSA and HCA attention design cuts KV cache to just 10% and inference FLOPs by nearly 4x at 1M-token context. This is what makes the agent loop actually fast and cheap enough to run in practice.

elvis@omarsar0 · X

58导出 Markdown