RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Computer vision / Depth Estimation
Computer vision

Depth Estimation

Depth estimation is a computer vision task that predicts a depth value for each pixel in an image, producing a depth map where closer objects appear brighter (or darker, depending on convention). Operators encounter it in local AI when running monocular depth models like MiDaS or Depth Anything, which take a single RGB image and output a grayscale depth map. These models are typically small enough to run on consumer GPUs (e.g., Depth Anything V2 Small at ~24 MB) and are used for 3D reconstruction, AR effects, or as preprocessing for other models. Inference speed depends on resolution and model size; a 518×518 image on an RTX 3060 runs at ~30-50 ms per frame.

Deeper dive

Depth estimation models are typically convolutional or transformer-based networks trained on large datasets of RGB-D images. Monocular depth estimation (from a single image) is an ill-posed problem, so models learn statistical cues like perspective, texture gradients, and object size. Two widely used families are MiDaS (multiple dataset training) and Depth Anything (large-scale synthetic data + fine-tuning). Both output inverse depth (disparity) by default, which can be scaled to metric depth if camera intrinsics are known. Operators can run these via Hugging Face Transformers or ONNX runtime. For real-time applications, smaller variants (Depth Anything V2 Small, 24 MB) achieve ~30 FPS on an RTX 3060, while larger variants (Depth Anything V2 Large, 300 MB) provide better accuracy at ~10 FPS. Depth maps are often used as input for 3D point cloud generation or as a conditioning signal for image-to-3D models.

Practical example

A rig with an RTX 3060 (12 GB VRAM) runs Depth Anything V2 Small (24 MB) on a 518×518 image in ~30 ms, producing a 518×518 depth map. The same model on an Apple M1 Max (32 GB unified memory) via MLX runs at ~40 ms. For higher accuracy, Depth Anything V2 Large (300 MB) takes ~100 ms on the RTX 3060. VRAM usage is minimal (<1 GB for batch size 1).

Workflow example

In a local AI pipeline, an operator might run python run_depth.py --model depth_anything_v2_small --input image.jpg using a Hugging Face Transformers script. The output depth map can be saved as a PNG and fed into a 3D reconstruction tool like Open3D to generate a point cloud. In LM Studio, depth estimation models are not natively supported, but operators can load them via the Python API using the transformers library.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →