Wiki

Clone wiki

zurichess / Choosing positions for Texel's Tuning Method

Abstract

Texel's Tuning Method is a method developed by Peter Österlund to tune chess evaluation parameters. TTM requires one to play many hyper-bullet games and extract a (random) set of positions from the games. Next, the evaluation parameters are tuned such that the selected positions evaluate closer to the actual game result (i.e. high-positive score indicates white won, 0 means tie, low-negative score indicates black won). While TTM has been used successfully by several engines including Zurichess, Texel, Gaviota, Pedone selecting which positions generate best tuned values is still unknown. In this article we explore what are the best way to select tuning positions.

Setup

Using the following cutechess command line and one of the latest versions of Zurichess (as of 08.Jan.2016) we generated 350.000 hyper-bullet games.

#!bash

$HOME/cutechess/projects/cli/cutechess-cli \
        -srand $RANDOM \
        -pgnout games.pgn \
        -repeat \
        -recover \
        -tournament gauntlet \
        -rounds 500000 \
        -concurrency 16 \
        -ratinginterval 50 \
        -draw movenumber=50 movecount=5 score=20 \
        -openings file=$HOME/2moves_v1.pgn format=pgn order=random \
        -engine cmd=$HOME/src/games/zurichess name=zurichess1 tc=40/2+0.05 \
        -engine cmd=$HOME/src/games/zurichess name=zurichess2 tc=40/2+0.05 \
        -each timemargin=60000 option.Hash=512 proto=uci

We used hyper-bullet time control (40 moves in 2 seconds plus 0.05s increment) as recommended by the inventor of TTM. A very high time margin was used in order to eliminate timeouts. We used automatic draw adjudication only to eliminate long boring games that could skew the set of positions. We didn't use any end game tables because Zurichess lacks such support. The opening book is 2moves_v1.pgn which is used by Stockfish and Zurichess testing framework and contains all balanced positions that are 2 moves away from the starting position.

From the generated pgn we extracted O(10.000.000) positions out of which we randomly selected 2.000.000 positions for tuning.

For each selected position we searched with Zurichess at depth 1 (plus quiescence search). We picked the position at the end of returned principal variation to evaluate and tune the weights. Zurichess' evaluation function is a simple neural network with no hidden layer and one output as described here. We tuned the weights with the ADAM optimizer provided by the Tensorflow open-source library. After the weight tuning is completed we tested the parameters using self-play between the new version and the commit c46a425 of Zurichess with stopping criteria: more than 30.000 games, or SPRT with goal posts (-3 ELO, 1 ELO).

In each test we varied which positions were extracted and repeated the process three times.

Results

We ran several test changing the position selection method.

capt: duplicates, captures, no checks, no mates (default)

In this test we included duplicates, captures, but excluded 5 positions before and including checks, and mates.

1st run 2nd run 3rd run
7100 @ 40/15+0.05 13200 @ 40/15+0.05 30800 @ 40/15+0.05
2518 - 2308 - 2274 4427 - 4585 - 4188 10459 - 10500 - 9841
ELO 10.28±6.66 ELO -4.16±4.90 ELO -0.46±3.20
1st run Pawn Knight Bishop Rook Queen
MidGame 5675 52253 55122 76980 187221
EndGame 6749 28777 32599 62554 100550

quiet: duplicates, no captures, no checks, no mates

In this test we included duplicates, but excluded 5 positions before and including captures, checks, and mates. It's clear that not including captures results in a huge elo drop.

1st run 2nd run 3rd run
2000 @ 40/15+0.05 1400 @ 40/15+0.05 900 @ 40/15+0.05
631 - 765 - 604 411 - 556 - 433 234 - 361 - 305
ELO -23.31±12.74 ELO -36.11±15.18 ELO -49.36±18.57

There is a huge variation between mid game and end game piece scores.

1st run Pawn Knight Bishop Rook Queen
MidGame 9054 62849 64422 90246 226425
EndGame 6944 21486 24785 49673 68974

unique: no duplicates, captures, no checks, no mates

In this test we included captures, but excluded duplicates and 5 positions before and including captures, checks, and mates. Dropping duplicate positions gives more consistent results.

1st run 2nd run 3rd run
9200 @ 40/15+0.05 19300 @ 40/15+0.05 29000 @ 40/15+0.05
3248 - 3048 - 2904 6606 - 6431 - 6263 9883 - 9729 - 9388
ELO 7.55±5.87 ELO 3.15±4.03 ELO 1.85±3.29
1st run Pawn Knight Bishop Rook Queen
MidGame 7075 51312 53813 75629 183804
EndGame 6106 31556 35639 65955 108108

nodraw: no draws, no duplicates, captures, no checks, no mates

This test is similar to unique, but we dropped positions from drawn games. From the results including the draws is beneficial.

1st run 2nd run 3rd run
47700 @ 40/15+0.05 25500 @ 40/15+0.05 15700 @ 40/15+0.05
17409 - 17404 - 12887 9249 - 9426 - 6825 5637 - 5820 - 4243
ELO 0.04±2.66 ELO -2.41±3.65 ELO -4.05±4.64

Dropping draws, probably dropped a lot of positions from drawn long late end games.

1st run Pawn Knight Bishop Rook Queen
MidGame 7091 56023 60280 80018 238436
EndGame 28519 117904 126203 219046 369596

drop4: more than 4 non-pawns, no duplicates, captures, checks, no mates

This test is similar to unique, but we dropped all positions that had at most 4 non-pawns on the board. In this test we dropped all classical endgames rook vs rook, or rook vs minor. Surprisingly, the results were positive.

1st run 2nd run 3rd run
7900 @ 40/15+0.05 9700 @ 40/15+0.05 11900 @ 40/15+0.05
2898 - 2665 - 2337 3475 - 3272 - 2953 4262 - 4058 - 3580
ELO 10.25±6.43 ELO 7.27±5.77 ELO 5.96±5.22

Again, dropping positions from end games tends to increase piece scores in the endgames.

1st run Pawn Knight Bishop Rook Queen
MidGame 5363 39652 42277 57185 140637
EndGame 9844 54153 58849 103947 189061

drop30: at most 30 pieces, no duplicates, captures, checks, no mates

This test is similar to unique, but we dropped all positions that had at more than 30 pieces. These positions occur mostly in the opening. Early results suggest a slight improvement.

1st run 2nd run 3rd run
x x x
1st run Pawn Knight Bishop Rook Queen
MidGame 5379 52514 55842 79687 190321
EndGame 6840 29658 33026 61593 105525

Updated