Do not Rasterize, but Ray Trace 3D Gaussian

Link Copied!

TL; DR

This article provides an in-depth review of the paper "3D Gaussian Ray Tracing," which introduces a novel approach to leveraging ray tracing in 3D Gaussian Radiance Fields.

3D Gaussian Splatting is a powerful and fascinating technology, but it inherits several problems from rasterization. The recently presented 3D Gaussian Ray Tracing (3D GRT) resolves many of these shortcomings by introducing a Differentiable Ray Tracer for 3D Gaussians.

Let’s deep dive into the 3D GRT!

project page: Link

1. Introduction

Challenges in 3D Gaussian Splatting

3D Gaussian Splatting (3D GS) has emerged as a promising approach for high-fidelity novel-view synthesis and real-time rendering, leveraging sophisticated tile-based rasterization. Despite its potential, this area—often referred to as the next frontier in photogrammetry—continues to face several significant challenges.

A major limitation of 3D GS stems from its reliance on rasterization, which introduces several constraints:

Inflexibility with Diverse Camera Models
One of the primary limitations of 3D Gaussian Splatting is its inflexibility in accommodating various camera models. As highlighted in previous article, the use of EWA splatting introduces affine projection errors. These errors complicate achieving high-quality results, even when modeling non-pinhole camera types.

다양한 Camera Model 에 유연하지 못하다.
이전 3D GS rasterization 분석글 에서도 지적한 바와 같이 EWA splatting 으로 구현된 rasterization 은 affine projection error 가 있으며, 이 때문에 다양한 camera 에 대한 modeling 을 구현하더라도 high-quality 로 학습하는 것이 쉽지 않다.

Inflexibility of 3D GS with Diverse Camera Models, source: Optimal GS
Sensitivity to Image Quality
Unlike NeRF, which utilizes MLPs and exhibits a degree of robustness against calibration discrepancies between images, 3D GS relies on explicit geometric primitives. This reliance renders 3D GS highly sensitive to variations in image quality, including issues such as motion blur and rolling shutter effects, which can significantly degrade the final output.

Image Quality 에 민감하다.
MLP 로 이루어진 NeRF 는 image 간 calibration 차이에도 일부 강건하지만, 3D GS 는 explicit primitive 를 사용하기 때문에 motion blur, rolling shutter 등 image quality 에 극도로 민감하다.

Left: GT, Right: trained 3D GS on motion blurred scene,
source: Deblurring 3D GS
Lack of Physically-Based Rendering Capabilities
Another challenge facing 3D GS is its inability to incorporate physically-based rendering (PBR) effects. Since 3D GS does not adhere to the principles of PBR, accurately modeling lighting and reflection effects within a scene remains problematic. This limitation restricts the realism and applicability of 3D GS in scenarios where accurate light interaction is critical.

물리적인 반사 효과 모델링이 힘들다.
Physically-based Rendering 이 아니기 때문에 조명, 반사 효과 등을 모델링하기 힘들다.

Left: GT, Right: 3D GS, source: 3D GS-DR

To address some of these challenges, RadSplat proposes a two-stage learning process. In this approach, Radiance Fields are first learned using Zip-NeRF, which generates perfectly calibrated pinhole images within the NeRF scene. These images are then used as training data for 3D GS.

However, this method is inefficient due to its two-stage nature and fails to resolve the fundamental limitations posed by rasterization, particularly with physically-based rendering.

2. Background

2.1. Parameterization

The primitive kernel in this method is defined using the covariance matrix in 3D space, consistent with the original 3D GS approach.

$$G(x) = \exp \left( {- \frac{1}{2} x^{\rm T} \Sigma^{-1} x} \right )$$

Due to this shared kernel definition, most calculations remain similar between the two methods. However, there is a notable difference concerning the direction used when calculating Spherical Harmonics to RGB (SH2RGB).

In 3D GS, the direction is derived from the camera position $o$ and the Gaussian means $\mu$, which is then utilized in SH2RGB calculations.

$$ \frac{\mu - \mathbf{o}}{\| \mu - \mathbf{o} \|} $$

This approach, however, results in a direction that slightly deviates from the actual angle projected onto the pixel.

// in preprocessCUDA
glm::vec3 result = computeColorFromSH(idx, D, M, (glm::vec3*)orig_points, *cam_pos, shs, clamped);
rgb[idx * C + 0] = result.x;
rgb[idx * C + 1] = result.y;
rgb[idx * C + 2] = result.z;

The precomputed RGB is then input to the Render kernel.

The reason for not using the precise ray direction is that color values are pre-computed and stored for use during tile-wise rasterization. This pre-computed RGB data is subsequently fed into the render kernel, optimizing rendering speed.

While this method enhances rendering performance, it compromises the ability to accurately model illumination effects—one of the inherent weaknesses of 3D GS. To address this issue, the 3D GRT utilizes the actual ray direction in SH2RGB calculations, improving illumination effect modeling.

2.2. Hardware-Accelerated Ray Tracing

NVIDIA GPUs, particularly those in the RTX series, are equipped with dedicated RT cores designed for ray tracing. These RT cores handle the intersection calculations between rays and particles, while the more computationally demanding tasks, such as shading, are assigned to the Streaming Multiprocessors (SMs), optimizing overall performance.

However, existing ray tracers are typically optimized for rendering opaque particles. This means that during ray traversal, the expected hit count is low, and interaction between the SMs and RT cores is minimized.

Since 3D Gaussian Splatting involves semi-transparent particles, conventional ray tracers are inefficient in this context. The semi-transparency of 3D Gaussian Splatting increases the complexity of ray tracing, requiring more sophisticated handling of ray-particle intersections to achieve efficient and accurate rendering.

3. 3D Gaussian Ray Tracing

To effectively design a ray tracer tailored for 3D Gaussian Splatting, two key elements are essential:

BVH with Appropriate Proxy Primitives
Use Bounding Volume Hierarchy (BVH) to accelerate hit traversal by defining proxy primitives that encapsulate 3D Gaussians accurately.
Rendering Algorithm
Develop a rendering algorithm that casts rays and gathers information specific to 3D Gaussian Ray Tracing, optimizing the process for the unique characteristics of Gaussian splats.

3.1. Bounding Primitives

Let's start with BVH (Bounding Volume Hierarchy).

BVH is a hierarchical tree structure used to efficiently divide space for rendering and ray tracing. In this structure, parent nodes consist of larger bounding volumes that encompass smaller leaf nodes, facilitating efficient space partitioning and exploration.

The main objective of BVH in this context is to define a proxy primitive that accurately encapsulates 3D Gaussians and to use this proxy geometry to construct a BVH. This hierarchy then guides the ray traversal process by determining which 3D Gaussians should be considered for intersection tests.

NVIDIA OptiX, a common framework for ray tracing, offers three predefined proxy primitive types: triangles, spheres, and axis-aligned bounding boxes (AABBs). However, none of these are ideal for 3D Gaussians. For instance, using AABBs would simplify calculations but would lead to many false-positive proxy hits, as AABBs cannot tightly enclose the Gaussian distribution, leading to inefficiencies in ray tracing (see Fig. 4)

Stretched Polyhedron Proxy

After experimental evaluations, the authors found that using an icosahedron—a polyhedron with 20 triangular faces—was the most appropriate proxy geometry for 3D Gaussians.

The benefits of using an icosahedron include:

Efficient Ray-Face Intersection Calculation: Since the icosahedron consists of triangular faces, the intersection tests between rays and these faces are optimized at the hardware level.

Accurate Wrapping: The icosahedron can wrap around a 3D Gaussian distribution effectively, minimizing both false positives and false negatives.

For an icosahedron inscribed in a unit sphere, the proxy geometry is computed by transforming each vertex using the following formula:

$$ v \leftarrow v \sqrt{2 \log (\sigma / \alpha_{\min})} \ {\rm SR^T} + \mu $$

To break down this formula:

Stretching

The transformation matrix ${\rm SR^{T}}$ and mean vector $\mu$ adjust the icosahedron to fit the local coordinates of the 3D Gaussian. This involves stretching, rotating, and translating the initial icosahedron to properly enclose the 3D Gaussian distribution.
Adaptive Clamping

The scaling term $\sqrt{2 \log (\sigma / \alpha_{\min})}$ determines how the icosahedron is scaled. Specifically, the parameter $\alpha_{\min}$ (set to 0.01) represents the minimum response value.

Though the term is a little bit tricky when considering identical scaling, this relationship simplifies to:

$$ \sigma / \alpha_{\min} = \exp(0.5) \ \Rightarrow \sigma \cdot \exp (- 1/2) = \alpha_{\min} $$

Doesn't the right side look familiar? This expression closely resembles the response function of the Gaussian Splatting.

$$ f_i(p) = \sigma_i \cdot \exp \left( - \frac{1}{2} (\mu_i -p)^{\rm T} \Sigma_i^{-1} (\mu_i - p ) \right ) $$

The scaling factor adjusts the icosahedron to match the point where the response in the Gaussian distribution drops to $\alpha_{\min}$, effectively clamping the scale at a confidence interval where the standard deviation equals 1.

In my opinion, the choice of $\alpha_{\min} = 0.01$ is justified by calculating the Gaussian pdf at a standard deviation of 2.6, which corresponds to approximately 99% confidence, yielding a value close to 0.01.

$$ \exp\left(-\frac{1}{2}(2.6)^2 \right) \cdot \frac{1}{\sqrt{2 \pi}} \approx 0.01 $$

Similarly, 3D GS uses a scaling factor equivalent to three times the standard deviation to compute the radius for culling.

Adaptive Clamping allows for the scaling of the proxy primitive to be small for nearly transparent particles and larger for more opaque ones, improving the accuracy and efficiency of the ray tracing process.

This approach enables a more efficient and accurate ray-tracing mechanism for 3D Gaussians by using an optimized proxy geometry that is both computationally feasible and tightly conforms to the Gaussian distribution.

3.2. Ray Tracing Renderer

For differentiable and efficient rendering in 3D GRT, the process involves sequentially rendering through a next $k$ closest hit. This approach helps in managing multiple semi-transparent particles along a single ray path. The rendering process is outlined as follows:

Track Particles Using BVH
The next $k$ closest particles along a ray path are tracked using the BVH. At this stage, the hit response (i.e., the particle's contribution to the final image) is not yet measured.
Measure Hit Response Iteratively
Once the $k$ particles are identified, the actual hit response for each particle is measured iteratively within each chunk of the $k$-buffer. This step involves checking all particles that intersect with the ray.
Proxy Hit Verification
During the response measurement, all proxy-hit particles along the ray are checked to determine their actual contribution based on their proximity and alignment with the ray.
Rendering Termination
The rendering process continues until a certain threshold is reached, beyond which additional particle contributions are negligible, and rendering can be stopped.

The following diagram (referenced in the text) illustrates the ray tracing process of 3D GRT when $k=3$, showing how multiple particles are managed and rendered along a single ray.

**Figure 6. Next $k$ closest hit Ray Tracer:** on each round of tracing, the next $k$ closest hit particles are collected and sorted into depth order along the ray, the radiance is computed in-order, and the ray is cast again to process the next chunk.

3.3. Ray-Gaussian Intersection

To calculate the contribution of each particle during ray tracing, 3D GRT determines the point where the particle's response (or contribution to the final rendered image) is maximized. This is achieved through the following mathematical formulation:

$$ \tau_{\max} = \frac{(\mu - \mathbf{o})^{\rm T} \Sigma^{-1} \mathbf{d}}{\mathbf{d}^{\rm T} \Sigma^{-1} \mathbf{d} } = \frac{-\mathbf{o}_g^{\rm T} \mathbf{d}_g}{\mathbf{d}_g^{\rm T}\mathbf{d}_g} $$

where $ \mathbf{o}_g = {\rm S^{-1}R^T}(\mathbf{o} - \mu), d_g = {\rm S^{-1}R^T} \mathbf{d}$.

Let's interpret this step by step!

Transformation to Local Coordinates

The variables $\mathbf{o}_g$ and $\mathbf{d}_g$ represent the ray origin and direction, which is transformed into the local coordinate system of the 3D Gaussian (just as the proxy primitive is defined in these local coordinates).
Maximizing Gaussian Density

The density of the 3D Gaussian can be expressed as a 1D Gaussian along the ray:

$$ G(x_g) = \exp\left(-\frac{1}{2} \mathbf{x}_g^{\rm T} \mathbf{x}_g\right) \quad \text{where } \mathbf{x}_g = \mathbf{o}_g + t\mathbf{d}_g $$

Since $\exp(-x)$ is a decreasing function, the maximum density corresponds to the minimum value of the inner quadratic term $\mathbf{x}_g^{\rm T} \mathbf{x}_g$.
Optimization

The problem of finding the maximum density is equivalent to solving the following optimization problem:

$$ \min_t \ (\mathbf{o}_g + t \mathbf{d}_g)^T (\mathbf{o}_g + t \mathbf{d}_g) . $$

Since this is a convex function with respect to $t$, the maximum can be found by setting the derivative with respect to $t$ to zero:

$$ \begin{aligned} \nabla_t f(t) &= \frac{d}{dt} \left( (\mathbf{o}_g + t \mathbf{d}_g)^T (\mathbf{o}_g + t \mathbf{d}_g)\right) \\ &= 2 \mathbf{d}_g^T (\mathbf{o}_g + t \mathbf{d}_g). \end{aligned} $$

Subsequently, the analytic solution can be derived as follows:

$$ 2 \mathbf{d}_g^T (\mathbf{o}_g + t \mathbf{d}_g) = 0 \\ \rightarrow t = -\frac{\mathbf{o}_g^{\rm T} \mathbf{d}_g}{\mathbf{d}_g^{\rm T}\mathbf{d}_g} $$

This equation represents the point along the ray where the Gaussian density, and therefore the particle's contribution to the final image, is maximized.

Intuitively, The closer the ray direction $\mathbf{d}_g$ is to the origin of the 3D Gaussian, the higher the response or contribution from that particle will be.

Note that, even though ray tracing is performed in the order of proxy hits, the approximation using this method does not significantly degrade performance, despite any slight differences between proxy hit order and actual maximum response order.

4. Experiments

4.1. Quantitative Results

The quantitative evaluations of 3D GRT indicate that there is almost no significant difference between the quantitative metrics of 3D GRT and other novel view synthesis (NVS) techniques. While the fps is slightly lower in comparison, 3D GS still achieves real-time performance.

Ablation Study

The paper also explores the design of the Next $k$-closest Ray Tracer (Fig.8 top left), validation for the proxy mesh design (Fig. 8 bottom left) and the determination of an optimal $k$ value in the $k$-buffer (Fig. 8 top right).

Experimental results support the design of the Ray Tracer, highlighting the importance of these parameters in achieving efficient and accurate rendering.

Particle Kernel Design

Since the particle kernel for the designed Ray Tracer does not need to be strictly a 3D Gaussian, the authors experimented with different kernel designs. The four kernels evaluated include:

3D Gaussian
$$ \hat{p}(x) = \sigma e^{ -(x-\mu)^{\rm T} \Sigma^{-1} (x-\mu) } $$
Generalized Gaussian: generalized of degree $n$
$$ \hat{p}_n(x) = \sigma e^{- \left((x-\mu)^{\rm T} \Sigma^{-1} (x-\mu)\right)^n } $$
2D Gaussian: Gaussian Surfels, suggested in 2D Gaussian Splatting (cf. my previous review )
Cosine wave modulation: aims to model a particle with spatially varying radiance
$$ \hat{p}_c(x) = \hat{p}(x) \left ( 0.5 + 0.5 \cos (\psi {\rm R^T S^{-1}} (x-\mu)) \right ) $$

As shown in Figure 10, the reconstruction performance is similar across all kernels tested. However, when using the Generalized Gaussian (GG) kernel, the frames per second (fps) nearly double compared to the standard 3D Gaussian kernel.

This increase in fps is due to the GG kernel’s design, which makes the density more concentrated around the mean. As the density is modeled closer to an opaque particle, the number of ray-particle intersections decreases, thereby improving rendering efficiency.

This effect is also evident in the ray-hit visualization provided by the authors, which shows fewer ray-particle interactions for the GG kernel.

**Figure 11.** Ray hit count for left: 3D G, right: GG

4.2. Qualitative Results

In addition to the quantitative analysis, the qualitative results demonstrate how this method effectively overcomes the limitations of rasterization.

Specifically, the 3D GRT shows significant improvements in modeling and rendering, particularly in handling complex light effects across various camera models. This ability to accurately represent lighting and reflections, which are often challenging in rasterization-based techniques, demonstrates that the method could be highly effective in realistic rendering scenarios.

**Figure 12.** 3D GRT w/ various light effect

**Figure 13.** 3D GRT's reconstruction capability for non-pinhole camera

Overall, the combination of both quantitative and qualitative evaluations highlights the strengths of this new 3D GS approach, especially in terms of performance, memory efficiency, and the ability to handle complex visual effects.

5. Conclusion

This paper presents a comprehensive exploration of the differences and advantages of using a ray tracing-based renderer for 3D Gaussian Splatting compared to traditional rasterization techniques.

While rasterization excels in speed, especially for primary rays from pinhole cameras, 3D GRT offers greater flexibility and generality. It enables advanced rendering effects such as reflections, refractions, depth of field, and complex camera models, which are difficult or impossible to achieve with rasterization.

The ray tracing approach significantly broadens the scope of 3D GS, allowing for more accurate modeling of general lighting, image formation, and sub-pixel behaviors. It also facilitates the exploration of global illumination, inverse lighting, and physically-based surface reflection models, paving the way for new research directions in these areas.

However, the inherent trade-offs between the two methods are evident. While rasterization remains faster in scenarios involving primary rays and static scenes, 3D Gaussian Ray Tracing, despite being carefully optimized for hardware acceleration, still requires more computational resources, particularly when frequent BVH rebuilds are necessary for dynamic scenes.