Abstract

Texel's Tuning Method is a method developed by Peter Österlund to tune chess evaluation parameters. TTM requires one to play many hyper-bullet games and extract a (random) set of positions from the games. Next, the evaluation parameters are tuned such that the selected positions evaluate closer to the actual game result (i.e. high-positive score indicates white won, 0 means tie, low-negative score indicates black won). While TTM has been used successfully by several engines including Zurichess, Texel, Gaviota, Pedone selecting which positions generate best tuned values is still unknown. In this article we explore what are the best way to select tuning positions.

Setup

Using the following cutechess command line and one of the latest versions of Zurichess (as of 08.Jan.2016) we generated 350.000 hyper-bullet games.

#!bash

$HOME/cutechess/projects/cli/cutechess-cli \
        -srand $RANDOM \
        -pgnout games.pgn \
        -repeat \
        -recover \
        -tournament gauntlet \
        -rounds 500000 \
        -concurrency 16 \
        -ratinginterval 50 \
        -draw movenumber=50 movecount=5 score=20 \
        -openings file=$HOME/2moves_v1.pgn format=pgn order=random \
        -engine cmd=$HOME/src/games/zurichess name=zurichess1 tc=40/2+0.05 \
        -engine cmd=$HOME/src/games/zurichess name=zurichess2 tc=40/2+0.05 \
        -each timemargin=60000 option.Hash=512 proto=uci

We used hyper-bullet time control (40 moves in 2 seconds plus 0.05s increment) as recommended by the inventor of TTM. A very high time margin was used in order to eliminate timeouts. We used automatic draw adjudication only to eliminate long boring games that could skew the set of positions. We didn't use any end game tables because Zurichess lacks such support. The opening book is 2moves_v1.pgn which is used by Stockfish and Zurichess testing framework and contains all balanced positions that are 2 moves away from the starting position.

From the generated pgn we extracted O(10.000.000) positions out of which we randomly selected 2.000.000 positions for tuning.

For each selected position we searched with Zurichess at depth 1 (plus quiescence search). We picked the position at the end of returned principal variation to evaluate and tune the weights. Zurichess' evaluation function is a simple neural network with no hidden layer and one output as described here. We tuned the weights with the ADAM optimizer provided by the Tensorflow open-source library. After the weight tuning is completed we tested the parameters using self-play between the new version and the commit c46a425 of Zurichess with stopping criteria: more than 30.000 games, or SPRT with goal posts (-3 ELO, 1 ELO).

In each test we varied which positions were extracted and repeated the process three times.

Results

We ran several test changing the position selection method.

capt: duplicates, captures, no checks, no mates (default)

In this test we included duplicates, captures, but excluded 5 positions before and including checks, and mates.

1st run	2nd run	3rd run
7100 @ 40/15+0.05	13200 @ 40/15+0.05	30800 @ 40/15+0.05
2518 - 2308 - 2274	4427 - 4585 - 4188	10459 - 10500 - 9841
ELO 10.28±6.66	ELO -4.16±4.90	ELO -0.46±3.20

1st run	Pawn	Knight	Bishop	Rook	Queen
MidGame	5675	52253	55122	76980	187221
EndGame	6749	28777	32599	62554	100550

quiet: duplicates, no captures, no checks, no mates

In this test we included duplicates, but excluded 5 positions before and including captures, checks, and mates. It's clear that not including captures results in a huge elo drop.

1st run	2nd run	3rd run
2000 @ 40/15+0.05	1400 @ 40/15+0.05	900 @ 40/15+0.05
631 - 765 - 604	411 - 556 - 433	234 - 361 - 305
ELO -23.31±12.74	ELO -36.11±15.18	ELO -49.36±18.57

There is a huge variation between mid game and end game piece scores.

1st run	Pawn	Knight	Bishop	Rook	Queen
MidGame	9054	62849	64422	90246	226425
EndGame	6944	21486	24785	49673	68974

unique: no duplicates, captures, no checks, no mates

In this test we included captures, but excluded duplicates and 5 positions before and including captures, checks, and mates. Dropping duplicate positions gives more consistent results.

1st run	2nd run	3rd run
9200 @ 40/15+0.05	19300 @ 40/15+0.05	29000 @ 40/15+0.05
3248 - 3048 - 2904	6606 - 6431 - 6263	9883 - 9729 - 9388
ELO 7.55±5.87	ELO 3.15±4.03	ELO 1.85±3.29

1st run	Pawn	Knight	Bishop	Rook	Queen
MidGame	7075	51312	53813	75629	183804
EndGame	6106	31556	35639	65955	108108

nodraw: no draws, no duplicates, captures, no checks, no mates

This test is similar to unique, but we dropped positions from drawn games. From the results including the draws is beneficial.

1st run	2nd run	3rd run
47700 @ 40/15+0.05	25500 @ 40/15+0.05	15700 @ 40/15+0.05
17409 - 17404 - 12887	9249 - 9426 - 6825	5637 - 5820 - 4243
ELO 0.04±2.66	ELO -2.41±3.65	ELO -4.05±4.64

Dropping draws, probably dropped a lot of positions from drawn long late end games.

1st run	Pawn	Knight	Bishop	Rook	Queen
MidGame	7091	56023	60280	80018	238436
EndGame	28519	117904	126203	219046	369596

drop4: more than 4 non-pawns, no duplicates, captures, checks, no mates

This test is similar to unique, but we dropped all positions that had at most 4 non-pawns on the board. In this test we dropped all classical endgames rook vs rook, or rook vs minor. Surprisingly, the results were positive.

1st run	2nd run	3rd run
7900 @ 40/15+0.05	9700 @ 40/15+0.05	11900 @ 40/15+0.05
2898 - 2665 - 2337	3475 - 3272 - 2953	4262 - 4058 - 3580
ELO 10.25±6.43	ELO 7.27±5.77	ELO 5.96±5.22

Again, dropping positions from end games tends to increase piece scores in the endgames.

1st run	Pawn	Knight	Bishop	Rook	Queen
MidGame	5363	39652	42277	57185	140637
EndGame	9844	54153	58849	103947	189061

drop30: at most 30 pieces, no duplicates, captures, checks, no mates

This test is similar to unique, but we dropped all positions that had at more than 30 pieces. These positions occur mostly in the opening. Early results suggest a slight improvement.

1st run	2nd run	3rd run
x	x	x

1st run	Pawn	Knight	Bishop	Rook	Queen
MidGame	5379	52514	55842	79687	190321
EndGame	6840	29658	33026	61593	105525

Wiki

zurichess / Choosing positions for Texel's Tuning Method