Lite Any Stereo V2 (LAS2):更快更强的零样本立体匹配
阅读原文· arxiv.orgLite Any Stereo V2 (LAS2) 是专为零样本立体匹配设计的超高速模型系列。它采用仅2D的成本聚合框架,针对实际推理延迟而非理论MACs进行优化。训练采用三阶段策略:合成监督、自蒸馏和真实世界知识蒸馏,并通过伪标签过滤和误差钳制操作提升伪标签可靠性。LAS2包含多个前馈变体和一个迭代变体。其中LAS2-H在零样本整体性能上优于迭代方法Fast-FoundationStereo,且在H200和Orin上推理速度分别快1.8倍和2.7倍。项目页面、演示和代码已公开。
Recent advances in stereo matching have achieved remarkable accuracy, but often rely on large models, heavy computation, or additional foundation-model priors, making them difficult to deploy on resource-constrained platforms. In contrast, efficient stereo models offer faster inference but are commonly considered less capable of strong zero-shot generalization. In this paper, we challenge this assumption by introducing Lite Any Stereo V2 (LAS2), an ultra-fast model series designed for efficient zero-shot stereo matching. LAS2 is developed from both architecture and training perspectives. Architecturally, we revisit efficient stereo design under practical deployment settings and propose a 2D-only cost aggregation framework, optimized for real inference latency rather than theoretical MACs alone. For training, we develop a three-stage strategy that combines synthetic supervision, self-distillation, and real-world knowledge distillation. To improve the reliability of real-world pseudo supervision, we further introduce pseudo-label filtering and an error-clamping operation, enabling smoother synthetic-to-real transfer. We instantiate LAS2 as a family of models, including feed-forward variants for different efficiency budgets and an iterative variant for higher accuracy. Extensive experiments show that LAS2 achieves state-of-the-art accuracy among efficient stereo methods while maintaining significantly lower latency. Specifically, LAS2-H achieves stronger overall zero-shot performance than the iterative method Fast-FoundationStereo, with 1.8x and 2.7x faster inference on H200 and Orin, respectively. The project page, demos, and code are available at https://tomtomtommi.github.io/LiteAnyStereoV2/.