# Holo3.1：快速本地计算机使用智能体

- 来源：Hugging Face：Blog（RSS）
- 发布时间：2026-06-02 22:13
- AIHOT 分数：73
- AIHOT 标记：精选
- AIHOT 链接：https://aihot.virxact.com/items/cmpwqvwfa05cqslsncgvlgd9q
- 原文链接：https://huggingface.co/blog/Hcompany/holo31

## 精选理由

Holo3.1 把计算机使用代理从桌面扩展到了移动端，还首次放出了量化版，让本地运行真正快了起来。想做 GUI 自动化的开发者可以立刻跑起来了。

## AI 摘要

Holo3.1 是基于 Qwen 模型家族的计算机使用智能体系列，旨在提升在桌面、网页和移动环境中的鲁棒性。新模型提供 0.8B、4B、9B 和 35B-A3B 四种尺寸，并首次发布量化检查点，包括 FP8、Q4 GGUF 和 NVFP4，以优化本地推理。在 AndroidWorld 基准测试中，35B-A3B 模型得分从 67% 提升至 79.3%。在 DGX Spark 上，NVFP4 量化相比 BF16 实现 1.74 倍 token 吞吐量提升，并将平均步骤时间从 6.8 秒缩短至 3.3 秒。模型支持函数调用协议，可在第三方智能体框架中部署。

## 正文

Holo3.1: Fast & Local Computer Use Agents

Team Article Published June 2, 2026

Maxime Langevin

maxime-hcompany

Hcompany

Hamza Benchekroun

hamza-hcompany

Hcompany

Axel Moyal

axmoy

Hcompany

Emrick Sinitambirivoutin

emricksini-h

Hcompany

Antonio Loison

antonioloison

Hcompany

Avshalom Manevich

avshalom-h

Hcompany

Tony Wu

h-tonywu

Hcompany

Pierre-Louis Cedoz

plcedoz38

Hcompany

Aurélien Lac

h-aurelien-lac

Hcompany

Ronan Riochet

rronan-h

Hcompany

Last March, we released Holo3, our state-of-the-art computer-use model. Adoption was immediate. Developers, enterprises, and partners started deploying Holo3 across a wide range of workflows, from browser automation and business software to internal tools and desktop applications. As adoption grew, we realized performance alone was no longer enough.

Users want to run the same computer-use capabilities across desktop and mobile environments, with seamless integration with different agent frameworks. They want deployment flexibility, from cloud inference to fully local execution on end-user devices.

This is why we are releasing the Holo3.1 family. Holo3.1 improves robustness across the three dimensions that matter most in production: environments (web, desktop, mobile), agent frameworks, and deployment targets. For the first time, we release quantized checkpoints optimized for local inference, including FP8, Q4 GGUF, and NVFP4.

Holo3.1 is a major step toward our vision of universal computer-use agents: systems that can operate across environments, integrate into any agent stack, and run wherever the workflow lives.

Computer Use Across GUI Environments and Agent Harnesses

Based on the Qwen family, Holo3.1 was designed to improve robustness across the environments where computer-use agents are actually deployed, while retaining state-of-the-art performance.

As teams moved Holo3 from evaluation to production, we repeatedly observed the same challenge: strong performance in one setting does not necessarily transfer to another. Mobile devices, alternative agent harnesses, and different execution frameworks all introduce their own sources of distribution shift.

Mobile Automation

Holo3.1 expands Holo3's capabilities beyond browser and desktop control, delivering major gains on mobile environments. On AndroidWorld, our 35B-A3B model improves from 67% to 79.3%, while the smaller 4B and 9B variants improve from 58% to 72%.

Cross-Harness Performance

To better support teams deploying Holo inside third-party agent stacks, Holo3.1 introduces native support for function-calling protocols in addition to the structured JSON outputs already available in Holo3.

Across OSWorld and our internal benchmark suite covering e-commerce, business software, and collaboration workflows, function-calling and native execution now achieve near-parity performance. Holo3.1 also delivers more than a 25% improvement over Holo3 when evaluated inside our Holotab product harness.

Smaller Sizes for Cost-Performance Tradeoffs

To further enable local and on-device inference, we are also releasing new model sizes including small models (0.8B, 4B, and 9B) for cost-effective and private deployment, in addition to the larger 35B-A3B model for state-of-the-art performance.

Performance versus cost for the Holo3.1 and Qwen 3.5 families. Overall performance averages the four H Corporate benchmarks first (so each family is equally weighted), then takes the mean across OSWorld, AndroidWorld, H Corporate, ScreenSpot-Pro, and OSWorld-G.

Fast & Local Inference

This is our first release to ship quantized weights. We’re starting with 35B-A3B checkpoints, available in FP8, Q4 GGUF, and NVFP4.

For NVFP4, we used NVIDIA's Model Optimizer in a W4A16 configuration. These checkpoints enable fast local inference for Computer Use Agents with little to no degradation in model performance. FP8 and NVFP4 achieve the same OSWorld scores, only about two points below the full-precision BF16 checkpoint.

The speedups are substantial: on DGX Spark, NVFP4 W4A16 delivers 1.41× the total token throughput of FP8 and 1.74× that of BF16.

Towards Local Agents on Consumer Hardware

We also release Q4 GGUF checkpoints aimed at local deployment of Computer Use Agents on consumer hardware.

The agent itself runs locally on a Windows or Mac machine, while the model can either run on that same machine—we include reference numbers for Apple Silicon—or on a DGX Spark on the same network. In both cases, execution stays fully private and local, with nothing leaving the user's network.

On Spark, agent harness optimizations we developed with NVIDIA combined with the NVFP4 quantization above deliver a compound ~2× end-to-end speedup over the FP8 baseline, cutting average step time from 6.8s to 3.3s.

Agent request rate across platforms and precisions. On DGX Spark, vLLM with NVFP4 achieves the highest request rate in both Default and Fast modes, followed by Q4 GGUF and FP8. These improvements and more will land in an upcoming desktop agent harness.

Availability

The Holo3.1 family is available in four sizes:

Model Deployment Target

Holo3.1-0.8B Ultra-lightweight local agents

Holo3.1-4B Cost-efficient deployment

Holo3.1-9B Balanced performance and latency

Holo3.1-35B-A3B State-of-the-art performance

We are also releasing optimized FP8, NVFP4, and Q4 GGUF checkpoints for local and edge deployment.

Holo Models API: https://hcompany.ai/holo-models-api

Hugging Face: https://huggingface.co/collections/Hcompany/holo31

We look forward to seeing what developers build with Holo3.1.

Collections mentioned in this article 1

Community

merve

12 days ago

A3B-35B is a beast!

· or to comment

Collections mentioned in this article 1
