TL;DR
In this article, we explore why positional encoding increases NeRF's high-fidelity reconstruction ability via exploring the paper: Fourier Features Let Networks Learn High-Frequency Functions in Low-Dimensional Domains
By leveraging a Neural Tangent Kernel (NTK) theory, the authors demonstrate that Fourier features improve the convergence and performance of neural networks on these complex tasks.
1. Introduction
Fourier-featuring is a function that embeds a coordinate space point into frequency space.
A prominent example in deep learning is 'Positional Encoding', which uses sinusoidal functions to embed coordinate space into frequency space, thereby incorporating positional information that Networks cannot capture.
Building upon NTK theory, this article foucuses on the theoretical investigation of how neural networks process coordinate information through Fourier-featuring, especially for the coordinate-based MLPs, which map dense, continuous low-dimensional input to the high-dimensional output (e.g., NeRF).
2. Background
2.1. Kernel Trick
For a linearly inseparable data point $x$, let $\phi (x)$ be a non-linear mapping function that makes $\phi (x)$ linearly separable.
The kernel trick performs kernel regression without explicitly finding the feature map by defining the kernel as follows:
$$K(x, \ x') = \phi(x) ^T \phi(x') $$
This approach is interpreted as using a feature map $\phi$ with desirable properties through the kernel, rather than mapping input $x$ and then taking the inner product.
2.2. Neural Tangent Kernel
Neural Tangent Kernel (NTK) theory describes the gradient descent-based training of deep neural networks with infinite width through kernel regression, aiming to explain neural networks using the kernel trick.
Linearization of NN Training & Kernel
A neural network can be represented by the linearization:
This Taylor expansion has the following properties:
- It is linear with respect to the weights $w$.
- It is non-linear with respect to $x$.
The gradient term $\nabla _w f( w_0 , \ x) ^T (w - w_0 )$ acts as a feature map that maps a non-linear data point $x$ to a useful space.
The corresponding kernel $K$ is defined as follows:
Gradient-Based Training & Kernel Regression
The NTK can be found through gradient descent in the neural network. For a timestep $t$, gradient descent is expressed as:
Subsequently, this can be derived as follows:
With least squares (MSE) as the loss function,
the gradient term $\nabla l$ with respect to the $w$ can be derived as
Therefore, Neural network training via optimization can be represented by NTK kernel regression:
Let $u=y(w)-y$, then the output residual at training iteration $t$ can be written as:
2.3. Spectral Bias of DNNs
Based on the NTK approximation, the network's prediction after $t$ iterations for test data $\mathbf X_\text{test}$ is:
For ideal training, $\mathbf K_\text{test} = \mathbf K$. i.e., equivalent to the last equation in 2.2.2.
By eigendecomposing $\mathbf K = \mathbf Q \mathbf \Lambda \mathbf Q^{\rm T}$, we obtain:
In the above Equation, the exponential decay term decreases with the eigenvalue. It means larger eigenvalues are learned first.
For example, in case of the image, large eigenvalues (in spectral domain) correspond to contours, so convergence to high-frequency components is slow without embedding in NeRF.
3. Fourier Features for a Tunable Stationary Neural Tangent Kernel
This section explores how Fourier Features embedding in the kernel space can address convergence issues for high-frequency components.
3.1. Fourier-Featuring
The Fourier-Feature mapping function $\gamma$ is defined as:
- Positional Encoding in Transformers: Adds spatial information to features in attention-based architectures, defined as: $a_i =1, \ b_i = 10000^{i / d} , \ d : \text{dimension}$
- Positional Encoding in NeRF: Provides even distribution of low & high-frequency information in the input, defined as: $a_i =1, \ b_i = 2^{i} {}$
The kernel induced by this mapping function is:
- remember: $\cos (\alpha - \beta ) = \cos \alpha \cos \beta \ + \ \sin \alpha \sin \beta$
This Fourier-feature kernel is a stationary function, meaning it is translation-invariant:
Coordinate-based MLPs use dense and uniform coordinate points as input. These must be isotropic to ensure global performance, meaning features should be extracted in all directions, not just specific ones.
This is why stationary properties that are location-invariant can improve performance. Positional encoding treats all equally distant relations from the coordinate system uniformly, enabling effective high-dimensional space reconstruction.
3.2. NTK Kernel with Fourier-Featuring
The NTK Fourier-featured kernel is:
$$K( \phi \circ \gamma (x) , \ \phi \circ \gamma (y) ) $$
Stationary kernel regression here equates to convolutional filtering with reconstruction, as the neural network approximates the convolution between synthetic kernels $K_\text{NTK}$ and $K_\gamma$ on data points $v_i$ and weights $w_i$.
Thus, the Fourier feature represented by NTK theory is:
where $\delta$ represents the direction delta.
This expression indicates:
- A stationary filter $h_\gamma$ extracts information in a location-invariant manner.
- Convolution, being the inverse Fourier transform of multiplication in frequency space, allows extraction of features across different frequencies in a multifaceted (yet location-invariant) way through components of specific frequencies directly embedded in $h_\gamma$.
- A Neural Network, receiving Fourier-featured input, is equivalent to performing kernel regression by combining NTK and a stationary kernel.
You may also like,