For the following I have used a slightly extended database (added some recent games by K. Utzinger) with duplicate games removed; I refer to this database by Eigenmann+.
By comparing the results and statistics of my Blitz rating list YABRL (50000 games) and Eigenmann+ (140000 games at longer time controls) I want to shed some light on following questions:
1) How much do Blitz ratings differ from ratings based on longer time controls? Is Blitz just CPU-intensive gambling or is there more value to it?
2) In how far do game statistics differ between Blitz and longer games?
3) Which engines get stronger at long time controls, which ones are balanced and which ones lose strength with increasing time controls? What is the order of rating difference we are talking about here?
For doing so I have calculated ratings for YABRL and the Eigenmann+ database with EloStat 1.3 - both with Ruffian 1.0.1 set to 2650 as reference point - and compared the results. Ruffian 1.0.1 was chosen, because it is - as we will see - a rather balanced engine.
What I found is:
1) Distributions of draws, white and black performance and also length of games (see percentages indicated in the lines "Games :" below) are very similar. I interpret this as a sign that there is a correlation between the two and that Blitz results are not just random. If Blitz results were more random than results at long time controls I would expect a higher percentage of shorter games in Blitz and distributions and percentages that differ more.
2) Generally ratings from YABRL and Eigenmann+ match very well (for exact figures see the engine comparison table below). For the majority of engines the ratings differ only by 30 or less points. For comparison: with 300 games error margins in a rating list are around +/- 30 points.
3) There are of course some engines where the differences of ratings are larger; the maximum observed is +/- 56 points. However, when looking at these engines we see some well known cases: it is known that SoS, Aristarch, Rebel and Comet for example are indeed engines that are stronger at long time controls. Likewise it is also known e.g. for Pepito, Fruit and Crafty 17.xx that they are stronger in Blitz. Also the fact that Patriot gets weaker at longer time controls can be observed in recent games in AEGT. Therefore I am pretty much convinced that these results are not just a statistical effect.
4) Which specific engine came out more as a Blitz expert or a long timer can be seen in the comparison list below (positive values mean stronger at long time controls).
5) Striking is the observation that MTD(f) based engines (two SoS versions, three AnMon versions and PostModernist) are over-represented amongst the long timers. Possibly this is a property of MTD(f).
6) Specific mentioning deserve Yace and Green Light Chess: In both cases recent changes in the engines apparently work only for short time controls, essentially turning the once long timers into Blitz experts without overall increase of playing strength.
7) Generally my conclusion is that a Blitz rating gives a surprisingly good estimation of an engine?s strength, in particular when by some more focused extra tests (e.g. matches against a set of balanced reference engines at different time controls) it is also determined whether an engine is a Blitz expert, balanced or long time expert. In any case Blitz results are definitely not random results.
Statistics of YABRL (5 min + 2 sec):
- Code: Select all
>= 20 moves:
Games : 49195 (finished) (100%)
White Wins : 20164 (41.0 %)
Black Wins : 16656 (33.9 %)
Draws : 12375 (25.2 %)
Unfinished : 0
White Perf. : 53.6 %
Black Perf. : 46.4 %
>= 30 moves:
Games : 48157 (finished) (97.9%)
White Wins : 19651 (40.8 %)
Black Wins : 16443 (34.1 %)
Draws : 12063 (25.0 %)
Unfinished : 0
White Perf. : 53.3 %
Black Perf. : 46.7 %
>= 40 moves:
Games : 45070 (finished) (91.6%)
White Wins : 17984 (39.9 %)
Black Wins : 15479 (34.3 %)
Draws : 11607 (25.8 %)
Unfinished : 0
White Perf. : 52.8 %
Black Perf. : 47.2 %
>= 50 moves:
Games : 39042 (finished) (79.4%)
White Wins : 15008 (38.4 %)
Black Wins : 13277 (34.0 %)
Draws : 10757 (27.6 %)
Unfinished : 0
White Perf. : 52.2 %
Black Perf. : 47.8 %
>= 60 moves:
Games : 30532 (finished) (62.1%)
White Wins : 11016 (36.1 %)
Black Wins : 9971 (32.7 %)
Draws : 9545 (31.3 %)
Unfinished : 0
White Perf. : 51.7 %
Black Perf. : 48.3 %
>= 70 moves:
Games : 21344 (finished) (43.4%)
White Wins : 6940 (32.5 %)
Black Wins : 6340 (29.7 %)
Draws : 8064 (37.8 %)
Unfinished : 0
White Perf. : 51.4 %
Black Perf. : 48.6 %
>= 80 moves:
Games : 14151 (finished) (28.8%)
White Wins : 3952 (27.9 %)
Black Wins : 3589 (25.4 %)
Draws : 6610 (46.7 %)
Unfinished : 0
White Perf. : 51.3 %
Black Perf. : 48.7 %
>= 90 moves:
Games : 9410 (finished) (19.1%)
White Wins : 2092 (22.2 %)
Black Wins : 1894 (20.1 %)
Draws : 5424 (57.6 %)
Unfinished : 0
White Perf. : 51.1 %
Black Perf. : 48.9 %
>= 100 moves:
Games : 6501 (finished) (13.2%)
White Wins : 1083 (16.7 %)
Black Wins : 1004 (15.4 %)
Draws : 4414 (67.9 %)
Unfinished : 0
White Perf. : 50.6 %
Black Perf. : 49.4 %
Statistics of Eigenmann+ (>30 min):
- Code: Select all
>= 20 moves:
Games : 117335 (finished) (100%)
White Wins : 47181 (40.2 %)
Black Wins : 37596 (32.0 %)
Draws : 32558 (27.7 %)
Unfinished : 0
White Perf. : 54.1 %
Black Perf. : 45.9 %
>= 30 moves:
Games : 113895 (finished) (97.1%)
White Wins : 45465 (39.9 %)
Black Wins : 36628 (32.2 %)
Draws : 31802 (27.9 %)
Unfinished : 0
White Perf. : 53.9 %
Black Perf. : 46.1 %
>= 40 moves:
Games : 104506 (finished) (89.1%)
White Wins : 40650 (38.9 %)
Black Wins : 33465 (32.0 %)
Draws : 30391 (29.1 %)
Unfinished : 0
White Perf. : 53.4 %
Black Perf. : 46.6 %
>= 50 moves:
Games : 87758 (finished) (74.8%)
White Wins : 32460 (37.0 %)
Black Wins : 27436 (31.3 %)
Draws : 27862 (31.7 %)
Unfinished : 0
White Perf. : 52.9 %
Black Perf. : 47.1 %
>= 60 moves:
Games : 66066 (finished) (56.3%)
White Wins : 22541 (34.1 %)
Black Wins : 19600 (29.7 %)
Draws : 23925 (36.2 %)
Unfinished : 0
White Perf. : 52.2 %
Black Perf. : 47.8 %
>= 70 moves:
Games : 45520 (finished) (38,8%)
White Wins : 13800 (30.3 %)
Black Wins : 12271 (27.0 %)
Draws : 19449 (42.7 %)
Unfinished : 0
White Perf. : 51.7 %
Black Perf. : 48.3 %
>= 80 moves:
Games : 30636 (finished) (26.1%)
White Wins : 7897 (25.8 %)
Black Wins : 7098 (23.2 %)
Draws : 15641 (51.1 %)
Unfinished : 0
White Perf. : 51.3 %
Black Perf. : 48.7 %
>= 90 moves:
Games : 20846 (finished) (17.8%)
White Wins : 4379 (21.0 %)
Black Wins : 3949 (18.9 %)
Draws : 12518 (60.0 %)
Unfinished : 0
White Perf. : 51.0 %
Black Perf. : 49.0 %
>= 100 moves:
Games : 14841 (finished) (12.6%)
White Wins : 2498 (16.8 %)
Black Wins : 2245 (15.1 %)
Draws : 10098 (68.0 %)
Unfinished : 0
White Perf. : 50.9 %
Black Perf. : 49.1 %
Rating differences between YABRL and Eigenmann+:
- Code: Select all
Engine YABRL Eigenmann+ Delta
Arasan v7.4 2396 2452 56
Comet B60 2436 2491 55
SoS 4 2559 2610 51
Aristarch v4.21 2582 2631 49
SoS 3 2564 2603 39
Anmon v5.51 2546 2584 38
Leila v0.53h 2425 2463 38
Quark v2.35 2501 2536 35
PostModernist v1.007 2442 2477 35
Aristarch v4.50 2614 2644 30
Tao v5.6 2523 2552 29
Anmon v5.30 2530 2558 28
Dragon v4.4.3 2466 2494 28
Tcb v0045 2412 2439 27
Pharaon v2.62 2511 2537 26
Rebel v12.00.01 2619 2641 22
Yace v0.99.56 2544 2566 22
Green Light Chess v3.00 2536 2554 18
Anmon v5.22 2485 2503 18
Little Goliath 2000 v3.5 2539 2556 17
SlowChess v2.89b 2488 2504 16
DeepSjeng v1.6 2623 2638 15
Tao v5.4 2477 2492 15
Ruffian v2.0.2 2674 2687 13
Delfi v4.5 2600 2613 13
SmarThink v0.16b++ 2560 2573 13
LambChop v10.99 2497 2507 10
Gromit v3.8.2 2499 2508 9
Gandalf v6.0WB 2692 2700 8
Crafty v18.15DC 2558 2566 8
Francesca M.0.0.9 2453 2459 6
Ruffian v2.0.0 2675 2680 5
SmarThink v0.17a 2603 2608 5
El Chinito v3.25 2568 2572 4
Gothmog v1.0 beta 10 2561 2563 2
Yace Paderborn 2555 2557 2
Ktulu v4.2 2587 2588 1
Ruffian v2.1.0 2679 2679 0
Ruffian v1.0.1 2650 2650 0
Amy v0.8.3 2475 2472 -3
Exchess v4.03 2336 2333 -3
SoS v11-99 2479 2468 -11
Resp v0.19 2402 2391 -11
Yace v0.99.87 2579 2565 -14
Little Goliath 2000 v3.9 2560 2546 -14
KnightDreamer v3.2 2475 2461 -14
List v5.12 2668 2652 -16
Pepito v1.59 profile 2551 2530 -21
Thinker v4.6b 2612 2586 -26
Wildcat v4.0 2565 2538 -27
Amyan v1.59 2512 2485 -27
Fruit v2.0 2668 2640 -28
Crafty v17.14DC 2586 2557 -29
Chess Tiger 2004 normal 2716 2685 -31
Fruit v1.5 2551 2519 -32
Green Light Chess v3.0.3.4 2544 2512 -32
Chess Tiger 15.0 normal 2723 2687 -36
Patriot v1.2.3 2595 2539 -56
Robert