in mind I think ~200 games per engine are a good compromise between test duration and accuracy. Therefore I let each tested book play 220 games with Glaurung 2 epsilon/4 in gauntlets. Glaurung (whatever book) did not play against itself.
The new book by Salvo Spitaleri (Sal) was used as Glaurung's own book. All other books were used as Polyglot books.
The books were:
- Dann Corbit's large book(DCl) downloaded from Marc Lacrosse's site,
- Performance.bin (perf) by Marc himself
- Guenther Simon's "GS_medium" (GSm) which was send to me by him by mail and is dated 07/June/13
- Salvo's book (Sal) downloaded from Tord's site
- one of my own creations (VP1)
3.5% of the games were lost on time (TimeControl: 60) and have been replayed until no game was lost for that reason anymore.
The other engines in the test group were: Ruffian 2.1.0, Fruit (Toga) 1.2.1a, Scorpio 1.91, Spike 1.2 Turin, Shredder Classic 1.3, Jonny 2.83, Yace Paderborn, Crafty-21.5, Zappa 1.1, Arasan 9.5, Hermann 2.0. Ponder and all learning were off.
The result of the different books are:
- Code: Select all
Rank Name Elo + - games score oppo. draws
1 Glaurung2 e4 VP1 116 42 41 220 69% -39 16%
2 Glaurung2 e4 GSm 109 41 40 220 69% -39 18%
3 Glaurung2 e4 perf 91 40 39 220 67% -39 23%
4 Glaurung2 e4 DCl 79 41 40 220 65% -39 17%
5 Glaurung2 e4 Sal 35 39 39 220 60% -39 19%
I have not seen a bad line in Guenther's book. Dann's book got a result similar to former (not published) tests. I don't trust the outcome of my own creation therefore I'm going to play another test with slower time controls. This will take approx. 10 days. I also don't trust the result of Salvo's book but I noticed that it is out of book as white after 1.e4 d5. A Norwegian engine and no idea of Scandinavian opening? Maybe there is a difference between the book usage of Glaurung and Polyglot although they use the same book format.
Comments appreciated