Self-Driving Cars
Self-driving cars, also known as autonomous vehicles, use AI to perceive their environment and navigate without human input. They rely on sensor fusion (cameras, LiDAR, radar) and deep learning models to detect objects, predict trajectories, and plan routes. Operators encounter this term when working with autonomous driving datasets (e.g., nuScenes, Waymo Open) or running perception models like YOLO or PointPillars on edge hardware (e.g., NVIDIA Jetson). The core challenge is real-time inference at low latency—models must process sensor data in milliseconds to make safe driving decisions.
Deeper dive
Self-driving cars are typically classified into six levels (0–5) defined by SAE International. Level 0 has no automation; Level 2 (e.g., Tesla Autopilot) handles steering and acceleration but requires driver supervision; Level 4 can operate without human input in geofenced areas; Level 5 is full automation everywhere. The AI stack includes perception (object detection, semantic segmentation), prediction (motion forecasting), planning (path optimization), and control (steering, throttle). Deep learning models like ResNet, EfficientDet, and Transformer-based architectures are common. Operators training or deploying these models must consider latency constraints—a 100 ms delay could mean a collision. Quantization (e.g., FP16, INT8) and model pruning are used to fit inference on embedded GPUs like the Jetson Orin, which has 40–275 TOPS. Simulators like CARLA or AirSim are used for testing before real-world deployment.
Practical example
An operator training a YOLOv5 object detector on the BDD100K driving dataset might use a RTX 4090 (24 GB VRAM) for training. For deployment on a Jetson Orin NX (16 GB RAM, 100 TOPS), they quantize the model to INT8 using TensorRT, reducing inference time from 30 ms to 8 ms per frame. This trade-off drops mAP by ~1% but meets the 20 ms latency budget for highway driving.
Workflow example
In a typical workflow, an operator downloads the nuScenes dataset (1.4 TB) and trains a PointPillars LiDAR detection model using PyTorch. They export to ONNX, then convert to TensorRT for the Jetson AGX Orin. During inference, the model processes 10 LiDAR sweeps per second, outputting bounding boxes for vehicles and pedestrians. The operator monitors latency via tegrastats and adjusts batch size or precision to stay under 50 ms per frame.
Reviewed by Fredoline Eruo. See our editorial policy.