Specialized domains

Game AI

Game AI refers to the algorithms and systems that control non-player characters (NPCs), opponents, and procedural content in video games. Unlike general AI (e.g., LLMs), game AI is optimized for real-time performance, determinism, and low latency, often using finite state machines, behavior trees, or pathfinding (A*). Operators running local AI may encounter game AI when using LLMs to generate dialogue or narratives, but core game AI remains separate—it runs on CPU/GPU with strict frame-time budgets (e.g., <1 ms per frame).

Deeper dive

Game AI has evolved from simple rule-based systems (e.g., Pac-Man ghosts) to complex behavior trees (e.g., Halo's Elites) and reinforcement learning (e.g., AlphaStar for StarCraft II). However, most commercial games still rely on deterministic, lightweight techniques because they must run at 30-60 FPS on diverse hardware. Modern trends include using LLMs for dynamic dialogue (e.g., in-game NPCs powered by local models like Llama 3.1), but this is distinct from traditional game AI. Operators running local AI for gaming should note that LLM inference (even quantized) adds 100-500 ms latency, which is too slow for real-time combat but acceptable for turn-based or narrative-driven interactions.

Practical example

An operator running a local LLM (e.g., Llama 3.1 8B Q4 on an RTX 4090) to generate NPC dialogue in a Skyrim mod will see ~30-50 tok/s, translating to 2-3 seconds per response. This is acceptable for dialogue but not for real-time enemy behavior, which still uses the game's built-in AI (e.g., behavior trees). The operator must manage VRAM: the LLM uses ~5 GB, leaving room for the game itself.

Workflow example

When integrating a local LLM into a game via LM Studio, the operator sets up an HTTP server (e.g., lm studio serve --port 1234) and the game mod sends dialogue prompts. The runtime loads the model into VRAM, and each inference call blocks the game thread until a response is received. Operators must monitor VRAM usage to avoid crashes—e.g., an RTX 3060 12 GB can run a 7B Q4 model alongside most games, but a 70B Q4 (~40 GB) requires offloading to system RAM, causing multi-second delays.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work