Screen Shot 2024-07-26 at 8.52.10 PM.png

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Introduction

Meshes and points are most common 3D scene representations since they are good fit for fast GPU-baed rasterization
Recent NeRF methods build on continuous scene representations
Stochastic sampling required for rendering NeRF is costly and can result in noise
Introduce 3D Gaussian representation that allows optimization with SOTA visual quality and competitive training times
Goal is to allow real-time rendering for scenes captured with multiple photos
Recent methods achieve fast training but struggle to achieve the visual quality of SOTA NeRF method Mip-NeRF360, which takes 48 hours of training time
Solution builds on 3 components
1. Introduce 3D Gaussians as a flexible and expressive scene representation
2. Optimization of the properties of the 3D Gaussians(3D position, opacity $\alpha$, anisotropic covariance, and spherical harmonic(SH) coefficients)
3. Real-time rendering solution that uses fast GPU sorting algorithms and is inspired by tile-based rasterization

Related Work

Traditional Scene Reconstruction and Rendering
- Structure-from Motion(SfM) enabled a entire new domain for novel-view synthesis
- Multi-View Stereo(MVS) produced impressive full 3D reconstruction algorithm
- These methods cannot completely recover from unreconstructed regions or from over-reconstruction
Neural Rendering and Radiance Fields
- Neural Radiance Fields(NeRFs) introduced importance sampling and positional encoding to improve quality but used a large MLP negatively affecting speed
- Mip-NeRF360 has extremely high training and rendering times
- InstantNGP and Plenoxels struggle to represent empty space effectively
Point-Based Rendering and Radiance Fields
- Point-based methods efficiently render disconnected and unstructured geometry samples but suffers from holes, causes aliasing, and is strictly discontinuous
- Recent differentiable point-based rendering techniques still depend on MVS
- Pulsar achieves fast sphere rasterization which inspired tile-based and sorting renderer, but use CNNs for rendering, which results in temporal instability
- Diffuse point-based rendering techniques only handle scenes of one object and needs masks for initialization, and is unclear how it can scale to scenes of typical datasets
- Recent approach employ point pruning and densification technique but use volumetric ray-marching and cannot achieve real-time display rates
- 3D Gaussians to represent captured human bodies focus on the specific case of reconstructing and rendering a single isolated object

Overview

Input is a set of images of a static scene with corresponding cameras calibrated by SfM, which produces a sparse point cloud
Create a set of 3D Gaussians defined by position(mean), covariance matrix and opacity $\alpha$ from point cloud, which results in a reasonably compact representation of the 3D scene
Directional appearance component(color) of the radiance field represented via SH
Algorithm then creates the radiance field representation via sequence of optimization steps of 3D Gaussian parameters(position, covariance, $\alpha$, SH coefficients)