How Does Leela Work?24 Jul 2018
How Does Leela Work?
Leela works differently from most chess engines. Like AlphaGo, it uses Monte-Carlo Tree Search (MCTS), guided by a neural network. But how does MCTS work, and why do we need a neural network?
Leela’s neural network takes a board position and calculates 2 kinds of values:
- A policy for each move, which estimates how plausible it is. Important moves like checks and captures should have high policy, and poor moves like dropping piece should have low policy.
- An evaluation (commonly called “value”) for the position, which estimates your chances of winning the game.
For example, in this position, Leela’s network gives ~100% policy to capturing the queen, because it’s clearly the best move. The value estimate is ~100% for Black, because Black will be up a queen and is almost certainly going to win.
Monte-Carlo tree search
On its own, Leela’s neural network is quite good at chess. But to make the best possible move, we need to consider many different ways that the game could play out. For example:
- (Bad!) 1. f3 e5 2. g4?? Qh4#
- (OK) 1. e4 e5
What moves should we look at first? We can use Leela’s policy estimate and explore the most promising moves first.
We also want to explore moves that aren’t explored much yet. If we’ve already thought a lot about a move, thinking a little more won’t help much, but if we’ve hardly thought about it at all, we could learn a lot!
At this point, we’ve decided what move to explore. We play out that move, and if we’ve already done that, we can play out another move. This goes on until we reach a position we haven’t seen before — a leaf node. We use the neural network’s value estimate to guess how good this position is.
Now, we’ve considered one way that the game could play out. Leela repeats this hundreds or thousands of times to find the best move:
- For each move, we keep track of the average value estimate at the leaf nodes. A move with good leaf nodes is probably a good move — if we consistently reach good positions, we can be pretty sure that the move is good.
- After we’ve done some thinking and we have a good idea of which moves are good or bad, we change what moves we explore. We prefer to explore moves with a high average value, and avoid exploring moves if they’re clearly bad.
Finally, we pick the move that we’ve thought about the most, the one with the most visits. Why? Since we try to explore good moves, the best move is usually the one we thought about the most!
This search algorithm works even with poor policy and values estimates, but Leela is much stronger when its neural network is accurate.
- If the policy is accurate, we’ll waste less time exploring bad moves
- If the estimated value is accurate, we’ll immediately know what positions are good, and we won’t have to explore more deeply.
Applied Data Science also has an amazing diagram explaining AlphaGo (Go, not chess):
Thanks to Alain Bellon (@TheDreamMaster) for the new Leela graphics. ]