
3D Gaussian Splatting for Real-Time Radiance Field Rendering
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Introduction
- Meshes and points are most common 3D scene representations since they are good fit for fast GPU-baed rasterization
- Recent NeRF methods build on continuous scene representations
- Stochastic sampling required for rendering NeRF is costly and can result in noise
- Introduce 3D Gaussian representation that allows optimization with SOTA visual quality and competitive training times
- Goal is to allow real-time rendering for scenes captured with multiple photos
- Recent methods achieve fast training but struggle to achieve the visual quality of SOTA NeRF method Mip-NeRF360, which takes 48 hours of training time
- Solution builds on 3 components
- Introduce 3D Gaussians as a flexible and expressive scene representation
- Optimization of the properties of the 3D Gaussians(3D position, opacity $\alpha$, anisotropic covariance, and spherical harmonic(SH) coefficients)
- Real-time rendering solution that uses fast GPU sorting algorithms and is inspired by tile-based rasterization
Related Work
- Traditional Scene Reconstruction and Rendering
- Structure-from Motion(SfM) enabled a entire new domain for novel-view synthesis
- Multi-View Stereo(MVS) produced impressive full 3D reconstruction algorithm
- These methods cannot completely recover from unreconstructed regions or from over-reconstruction
- Neural Rendering and Radiance Fields
- Neural Radiance Fields(NeRFs) introduced importance sampling and positional encoding to improve quality but used a large MLP negatively affecting speed
- Mip-NeRF360 has extremely high training and rendering times
- InstantNGP and Plenoxels struggle to represent empty space effectively
- Point-Based Rendering and Radiance Fields
- Point-based methods efficiently render disconnected and unstructured geometry samples but suffers from holes, causes aliasing, and is strictly discontinuous
- Recent differentiable point-based rendering techniques still depend on MVS
- Pulsar achieves fast sphere rasterization which inspired tile-based and sorting renderer, but use CNNs for rendering, which results in temporal instability
- Diffuse point-based rendering techniques only handle scenes of one object and needs masks for initialization, and is unclear how it can scale to scenes of typical datasets
- Recent approach employ point pruning and densification technique but use volumetric ray-marching and cannot achieve real-time display rates
- 3D Gaussians to represent captured human bodies focus on the specific case of reconstructing and rendering a single isolated object
Overview
- Input is a set of images of a static scene with corresponding cameras calibrated by SfM, which produces a sparse point cloud
- Create a set of 3D Gaussians defined by position(mean), covariance matrix and opacity $\alpha$ from point cloud, which results in a reasonably compact representation of the 3D scene
- Directional appearance component(color) of the radiance field represented via SH
- Algorithm then creates the radiance field representation via sequence of optimization steps of 3D Gaussian parameters(position, covariance, $\alpha$, SH coefficients)