Specialized domains

AlphaGo

AlphaGo is a computer program developed by DeepMind that plays the board game Go at a superhuman level. It combines deep neural networks with Monte Carlo tree search (MCTS). The policy network selects promising moves, the value network evaluates board positions, and MCTS simulates many game trajectories to choose the best move. AlphaGo's 2016 victory over Lee Sedol demonstrated that reinforcement learning from self-play could master a game previously considered too complex for AI. For local AI operators, AlphaGo is a landmark example of how neural networks + search can solve hard decision problems, though running a full AlphaGo-scale system requires far more compute than a single consumer GPU.

Deeper dive

AlphaGo's architecture consists of two main neural networks: a policy network (which outputs a probability distribution over moves) and a value network (which estimates the win probability from a given board state). These are trained in two phases: supervised learning from human expert games, then reinforcement learning via self-play. During play, AlphaGo uses MCTS to combine the policy network's suggestions with lookahead search. The version that beat Lee Sedol (AlphaGo Lee) used 48 TPUs for inference. Later versions (AlphaGo Zero, AlphaZero) removed human data entirely, learning solely from self-play. For operators, the key takeaway is that AlphaGo's approach—deep networks + tree search—is now used in other domains (e.g., protein folding via AlphaFold), but the compute requirements (hundreds of TPUs) are far beyond a local rig. However, smaller-scale MCTS + neural network implementations can run on consumer hardware for simpler games or optimization tasks.

Practical example

An operator wanting to experiment with AlphaGo-like methods on a single RTX 4090 could use the open-source Leela Zero project, which replicates AlphaGo Zero's self-play training. Leela Zero uses a residual neural network (similar to AlphaGo Zero) and MCTS. Training a strong Go model from scratch requires weeks of GPU time, but pre-trained networks (e.g., Leela Zero's 40-block network) can be used for analysis at ~1000 playouts per move on a 4090, giving near-superhuman strength.

Workflow example

To run Leela Zero on a local rig, an operator would download the Leela Zero binary and a pre-trained weights file (e.g., leelaz-network-weights.txt). Then run: ./leelaz -g -w leelaz-network-weights.txt. The engine uses the GPU via OpenCL or CUDA. In LM Studio or Ollama, no direct equivalent exists, but the concept of combining a neural network with search is used in some LLM inference pipelines (e.g., tree-of-thought decoding).

Reviewed by Fredoline Eruo. See our editorial policy.