Wiki
Clone wikizurichess / Choosing positions for Texel's Tuning Method
Abstract
Texel's Tuning Method is a method developed by Peter Österlund to tune chess evaluation parameters. TTM requires one to play many hyper-bullet games and extract a (random) set of positions from the games. Next, the evaluation parameters are tuned such that the selected positions evaluate closer to the actual game result (i.e. high-positive score indicates white won, 0 means tie, low-negative score indicates black won). While TTM has been used successfully by several engines including Zurichess, Texel, Gaviota, Pedone selecting which positions generate best tuned values is still unknown. In this article we explore what are the best way to select tuning positions.
Setup
Using the following cutechess command line and one of the latest versions of Zurichess (as of 08.Jan.2016) we generated 350.000 hyper-bullet games.
#!bash $HOME/cutechess/projects/cli/cutechess-cli \ -srand $RANDOM \ -pgnout games.pgn \ -repeat \ -recover \ -tournament gauntlet \ -rounds 500000 \ -concurrency 16 \ -ratinginterval 50 \ -draw movenumber=50 movecount=5 score=20 \ -openings file=$HOME/2moves_v1.pgn format=pgn order=random \ -engine cmd=$HOME/src/games/zurichess name=zurichess1 tc=40/2+0.05 \ -engine cmd=$HOME/src/games/zurichess name=zurichess2 tc=40/2+0.05 \ -each timemargin=60000 option.Hash=512 proto=uci
We used hyper-bullet time control (40 moves in 2 seconds plus 0.05s increment) as recommended by the inventor of TTM. A very high time margin was used in order to eliminate timeouts. We used automatic draw adjudication only to eliminate long boring games that could skew the set of positions. We didn't use any end game tables because Zurichess lacks such support. The opening book is 2moves_v1.pgn which is used by Stockfish and Zurichess testing framework and contains all balanced positions that are 2 moves away from the starting position.
From the generated pgn we extracted O(10.000.000) positions out of which we randomly selected 2.000.000 positions for tuning.
For each selected position we searched with Zurichess at depth 1 (plus quiescence search). We picked the position at the end of returned principal variation to evaluate and tune the weights. Zurichess' evaluation function is a simple neural network with no hidden layer and one output as described here. We tuned the weights with the ADAM optimizer provided by the Tensorflow open-source library. After the weight tuning is completed we tested the parameters using self-play between the new version and the commit c46a425 of Zurichess with stopping criteria: more than 30.000 games, or SPRT with goal posts (-3 ELO, 1 ELO).
In each test we varied which positions were extracted and repeated the process three times.
Results
We ran several test changing the position selection method.
capt: duplicates, captures, no checks, no mates (default)
In this test we included duplicates, captures, but excluded 5 positions before and including checks, and mates.
1st run | 2nd run | 3rd run |
---|---|---|
7100 @ 40/15+0.05 | 13200 @ 40/15+0.05 | 30800 @ 40/15+0.05 |
2518 - 2308 - 2274 | 4427 - 4585 - 4188 | 10459 - 10500 - 9841 |
ELO 10.28±6.66 | ELO -4.16±4.90 | ELO -0.46±3.20 |
1st run | Pawn | Knight | Bishop | Rook | Queen |
---|---|---|---|---|---|
MidGame | 5675 | 52253 | 55122 | 76980 | 187221 |
EndGame | 6749 | 28777 | 32599 | 62554 | 100550 |
quiet: duplicates, no captures, no checks, no mates
In this test we included duplicates, but excluded 5 positions before and including captures, checks, and mates. It's clear that not including captures results in a huge elo drop.
1st run | 2nd run | 3rd run |
---|---|---|
2000 @ 40/15+0.05 | 1400 @ 40/15+0.05 | 900 @ 40/15+0.05 |
631 - 765 - 604 | 411 - 556 - 433 | 234 - 361 - 305 |
ELO -23.31±12.74 | ELO -36.11±15.18 | ELO -49.36±18.57 |
There is a huge variation between mid game and end game piece scores.
1st run | Pawn | Knight | Bishop | Rook | Queen |
---|---|---|---|---|---|
MidGame | 9054 | 62849 | 64422 | 90246 | 226425 |
EndGame | 6944 | 21486 | 24785 | 49673 | 68974 |
unique: no duplicates, captures, no checks, no mates
In this test we included captures, but excluded duplicates and 5 positions before and including captures, checks, and mates. Dropping duplicate positions gives more consistent results.
1st run | 2nd run | 3rd run |
---|---|---|
9200 @ 40/15+0.05 | 19300 @ 40/15+0.05 | 29000 @ 40/15+0.05 |
3248 - 3048 - 2904 | 6606 - 6431 - 6263 | 9883 - 9729 - 9388 |
ELO 7.55±5.87 | ELO 3.15±4.03 | ELO 1.85±3.29 |
1st run | Pawn | Knight | Bishop | Rook | Queen |
---|---|---|---|---|---|
MidGame | 7075 | 51312 | 53813 | 75629 | 183804 |
EndGame | 6106 | 31556 | 35639 | 65955 | 108108 |
nodraw: no draws, no duplicates, captures, no checks, no mates
This test is similar to unique, but we dropped positions from drawn games. From the results including the draws is beneficial.
1st run | 2nd run | 3rd run |
---|---|---|
47700 @ 40/15+0.05 | 25500 @ 40/15+0.05 | 15700 @ 40/15+0.05 |
17409 - 17404 - 12887 | 9249 - 9426 - 6825 | 5637 - 5820 - 4243 |
ELO 0.04±2.66 | ELO -2.41±3.65 | ELO -4.05±4.64 |
Dropping draws, probably dropped a lot of positions from drawn long late end games.
1st run | Pawn | Knight | Bishop | Rook | Queen |
---|---|---|---|---|---|
MidGame | 7091 | 56023 | 60280 | 80018 | 238436 |
EndGame | 28519 | 117904 | 126203 | 219046 | 369596 |
drop4: more than 4 non-pawns, no duplicates, captures, checks, no mates
This test is similar to unique, but we dropped all positions that had at most 4 non-pawns on the board. In this test we dropped all classical endgames rook vs rook, or rook vs minor. Surprisingly, the results were positive.
1st run | 2nd run | 3rd run |
---|---|---|
7900 @ 40/15+0.05 | 9700 @ 40/15+0.05 | 11900 @ 40/15+0.05 |
2898 - 2665 - 2337 | 3475 - 3272 - 2953 | 4262 - 4058 - 3580 |
ELO 10.25±6.43 | ELO 7.27±5.77 | ELO 5.96±5.22 |
Again, dropping positions from end games tends to increase piece scores in the endgames.
1st run | Pawn | Knight | Bishop | Rook | Queen |
---|---|---|---|---|---|
MidGame | 5363 | 39652 | 42277 | 57185 | 140637 |
EndGame | 9844 | 54153 | 58849 | 103947 | 189061 |
drop30: at most 30 pieces, no duplicates, captures, checks, no mates
This test is similar to unique, but we dropped all positions that had at more than 30 pieces. These positions occur mostly in the opening. Early results suggest a slight improvement.
1st run | 2nd run | 3rd run |
---|---|---|
x | x | x |
1st run | Pawn | Knight | Bishop | Rook | Queen |
---|---|---|---|---|---|
MidGame | 5379 | 52514 | 55842 | 79687 | 190321 |
EndGame | 6840 | 29658 | 33026 | 61593 | 105525 |
Updated