# TriSplat：面向仿真的前馈式3D场景重建网络

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-25 08:00
- AIHOT 分数：66
- AIHOT 链接：https://aihot.virxact.com/items/cmpm2fnvm0kz0sl01lo2ubx7t
- 原文链接：https://arxiv.org/abs/2605.26115

## AI 摘要

TriSplat是一种前馈式3D场景重建网络，采用定向三角形基元表示场景。它能直接从稀疏视角图像，通过一次前向传播生成可用于仿真的网格场景。模型预测局部3D点图、三角形属性、相机位姿，并由点图构建法线以稳定三角形参数化。在RealEstate10K和DL3DV数据集上的实验表明，其几何保真度优于基于高斯的前馈基线，同时渲染质量具有竞争力。输出的表面三角形可直接被物理引擎和标准渲染管线使用，无需额外转换。

## 正文

Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the feed-forward promise. This limitation is especially pronounced in pose-free settings, where scene structure and camera parameters must be estimated jointly from sparse observations. We present TriSplat, a feed-forward reconstruction network that represents scenes with oriented triangle primitives and directly exports simulation-ready mesh scenes from a single forward pass. Given input images, the network predicts local 3D point maps, triangle attributes, camera poses, and optional intrinsics. Rather than regressing triangle orientation as an unconstrained latent variable, our approach constructs geometry normals from the predicted point maps, refines them with an image-conditioned normal head, and converts them into stable local frames for triangle parameterization. A mono-normal bootstrap schedule further stabilizes early training, while opacity and blur scheduling progressively sharpens the learned surface representation for direct mesh extraction. Experiments on RealEstate10K and DL3DV show that this representation produces more geometry-faithful reconstructions than Gaussian feed-forward baselines while maintaining competitive novel-view rendering quality. Because the rendering primitives are themselves surface triangles, the output can be directly ingested by physics engines, collision detectors, and standard rendering pipelines without any conversion, making it a practical simulation-ready solution for feed-forward 3D scene reconstruction.
