Back to topics

Scene Reconstruction

3D Gaussian Splatting

From 2D to 3D

3D Gaussian Splatting (Kerbl et al., 2023) extends the 2D Gaussian primitive into three dimensions. Each splat is now an anisotropic 3D Gaussian living in world space, defined by a mean position, a full 3×3 covariance matrix, opacity, and color.

The key insight: by placing millions of these 3D Gaussians in a scene and optimizing them to match input photographs, we can reconstruct a scene that can be rendered from any viewpoint in real time.

3D Gaussian Primitive

G(x) = exp( -½ (x - μ)T Σ-1 (x - μ) )

μ ∈ ℝ3, Σ ∈ ℝ3×3 (symmetric positive semi-definite)

3D Gaussian Point Cloud

Orbit around a cloud of 3D Gaussians. Each sphere represents a Gaussian with its own position, scale, and color.

The Rasterization Pipeline

Unlike NeRF which uses expensive ray marching, 3DGS uses a tile-based rasterization approach. This is what enables real-time rendering at 100+ FPS:

  1. Project — Transform 3D Gaussians to 2D screen space using the camera
  2. Sort — Order Gaussians by depth (front to back)
  3. Tile — Divide the screen into tiles and assign Gaussians to tiles
  4. Rasterize — For each tile, blend sorted Gaussians via alpha compositing

Rasterization Pipeline

Watch the rendering stages animate through the pipeline.

Spherical Harmonics for View-Dependent Color

Real surfaces change appearance depending on viewing angle — think of the sheen on a car or highlights on water. 3DGS models this using Spherical Harmonics (SH), a set of basis functions on the sphere.

Instead of storing a single RGB color, each Gaussian stores SH coefficients. When rendering from a given viewpoint, these coefficients are evaluated to produce a view-dependent color. Higher-degree SH capture more complex view-dependent effects.

Spherical Harmonics Color

c(d) = ∑l=0Lm=-ll cl,m · Ylm(d)

d = viewing direction, Ylm = SH basis functions, L = max degree (typically 3)

Spherical Harmonics Explorer

Move your mouse left/right to add SH bands. Watch how higher bands capture more complex directional color variation.

Optimization via Differentiable Rendering

The entire pipeline is differentiable. Starting from a sparse point cloud (e.g., from Structure-from-Motion), the system optimizes each Gaussian's parameters by comparing rendered images to ground-truth photographs.

The loss function combines L1 pixel loss with a D-SSIM term for structural similarity. Adaptive density control periodically splits, clones, or prunes Gaussians to improve coverage and reduce redundancy.

Training Loss

ℒ = (1 - λ) · ℒ1 + λ · ℒD-SSIM

λ = 0.2 balances pixel-level and structural similarity losses