Neural network architectures

Neural Radiance Field (NeRF)

A Neural Radiance Field (NeRF) is a neural network that represents a 3D scene as a continuous function mapping a 3D location (x,y,z) and viewing direction (θ,φ) to color and density. Given a set of 2D images from different angles, NeRF learns to synthesize novel views by querying the network along rays. For operators, NeRF is relevant as a specialized model type (not a general LLM) that requires significant VRAM and compute for training (often hours on a single GPU) but can be rendered interactively after optimization. It is not typically run via llama.cpp or Ollama; instead, it uses frameworks like PyTorch with custom CUDA kernels.

Deeper dive

NeRF works by overfitting a small MLP to a single scene. The input is a 5D coordinate (position + direction), and the output is RGB color and volume density. To render a new view, the algorithm casts rays from the camera, samples points along each ray, queries the network, and composites the colors using alpha blending based on density. Training requires hundreds of thousands of ray samples per iteration, often taking 1-2 days on a consumer GPU like an RTX 3090. Variants like Instant NGP use hash grids and multiresolution encoding to reduce training to minutes. NeRF is not a generative model; it memorizes one scene. Operators encounter NeRF in computer vision pipelines, not in text generation. It is typically run with PyTorch or JAX, not with inference engines designed for transformers.

Practical example

An operator captures 50 photos of a statue with a smartphone and runs a NeRF implementation like Instant NGP. Training on an RTX 4090 takes ~5 minutes, producing a model that fits in ~100 MB of VRAM. The operator can then render a 1080p video fly-around at 30 fps. Without a high-end GPU, training may take hours and rendering may drop to a few fps.

Workflow example

To train a NeRF, an operator typically uses a PyTorch-based codebase like nerfstudio. They run ns-train nerfacto --data /path/to/images which loads images, trains for ~30 minutes on an RTX 3080, and outputs a model. For rendering, they run ns-render --load-config config.yml --traj spline --output-path video.mp4. Unlike LLM workflows, there is no quantization or offload; the model stays in VRAM during training and rendering.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work