Gaussian splats是新兴的实时3D渲染技术,可在iPhone上实现自由视角的沉浸式场景浏览。该技术用高斯分布编码场景结构与外观,相比NeRFs极大提升渲染速度。当前突破包括单图生成(Apple ML SHARP)、动态场景捕捉(4DV ai)及生成模型填补未拍摄区域。未来将成为Vision Pro等VR设备的核心娱乐格式,并与世界模型结合实现城市级漫游或游戏化交互,但仍需解决创建效率、存储传输及视觉真实感等挑战。
I've been obsessed with the most exciting software tech today that's not AI: Gaussian splats.
It's the next generation of videos where you can move around in the scene. And the whole thing renders in realtime on your iPhone.
I went into a pretty deep rabbit hole on it.. so here's some history.
The initial idea was: can we take pictures from different angles and reconstruct a 3D scene? Fun fact: one of the seminal papers in the field ("Photo Tourism") was written by a professor I taught graphics for in college, Noah Snavely! Problem: objects look different at diff angles, because of light etc
Then we had NeRFs which could figure out lighting. Problem: extremely slow.
Gaussian splatting represented a 3D scene with diffuse blobs (gaussians) that encoded structure and appearance. Now, you could take camera shots or drone shots and make a splat in <5s. Problem: a) still needed many images b) splats were static and didn't have video in them c) unseen parts of video or holes are just black or missing