My interest for doing this comparison is twofold:
1) How do Blitz ratings generally correlate with ratings at longer time controls? Is testing at Blitz controls just "gambling" or are such results a meaningful measurement of playing strength?
2) With the hardware of YABRL becoming increasingly outdated (Athlon 1.1GHz), are YABRL results still useful?
The reference point I chose is Shredder 9 (in YABRL 2770, in CEGT 2750), as it is a quite balanced engine. In the list below positive "Delta" values for a given engine mean that that engine gains strength at longer time controls compared to Shredder 9, while negative values indicate a loss of playing strength. Hence, the larger negative values are, the more an engine is a "Blitzer", while large positive values indicate an expert for long time controls.
There are some uncertainties in these results:
1) Error margins still apply: some engines (Crafty 20.01, Chesstiger 2004, Tao 5.6, Smarthink 0.17a ...) have less than 100 games in CEGT.
2) CEGT has a wider variation of average opponent ELO than YABRL, which introduces an additional systematic error.
3) CEGT combines sometimes similar engine versions into one single rating (e.g. Fruit 2.2.x, Loop List 600 with and without EGTBs, Jonny and others).
Nevertheless the results correlate remarkably well: the bulk of the engines are within let?s say +/- 25 ELO points.
I therefore conclude:
1) Blitz testing obviously has a strong correlation with "real" playing strength at longer time controls.
2) YABRL results - even though Blitz on weak hardware - are meaningful.
3) Comparisons like the one below allow to identify "Blitz-experts" and "Long timers" among the engines.
Blitzers include Fruit 2.0, Pepito, Hiarcs X50, Yace, ProDeo 1.1 and Chesstiger 15 (Smarthink 0.17a, Crafty 20.1 and ChessTiger 2004 do not have enough games in CEGT, so the result below is most probably inaccurate).
Long timers include Shredder 10, Ktulu 7.0, SoS and Pharaon.
Rybka (again) is the odd one out and appears to be significantly better at long time controls, much more so than any other engine.
Robert
- Code: Select all
Program YABRL CEGT Delta
Rybka v1.2f 2878 2939 81
Rybka v1.1 2871 2900 49
Ktulu v7.0 2639 2661 42
Shredder 10 UCI sb345 2809 2822 33
SoS 5.1 2602 2613 31
Pharaon v3.3 2585 2595 30
Ufim v7.00 2510 2515 25
Chess Tiger 2004 normal 2716 2720 24
Delfi v4.6 2600 2603 23
Spike v1.1 2720 2722 22
Toga II v1.2.1 2810 2811 21
Amyan v1.595 2509 2509 20
Scorpio v1.7 2642 2641 19
SlowChess Blitz WV2.1 2660 2658 18
Pseudo v0.7c 2609 2605 16
Aristarch v4.50 2611 2606 15
Zappa v1.1 2607 2602 15
SmarThink v1.00WB 2706 2700 14
Ktulu v7.5 2702 2696 14
SlowChess Blitz WV2 2643 2636 13
DeepSjeng v1.6ntb 2612 2605 13
Green Light Chess v3.01.2.2 2550 2543 13
Fruit v2.1 2725 2717 12
Anmon v5.60 2559 2550 11
Spike v1.0a Mainz 2681 2671 10
Zappa v1.0 2584 2573 9
Glaurung v1.0.1 2671 2658 7
SlowChess Blitz WV 2613 2600 7
Naum v1.91 2653 2636 3
Rybka v1.0 Beta 2838 2820 2
Ruffian v1.0.1 2649 2631 2
Wildcat v6.0 2633 2614 1
Shredder 9 UCI dcbk 2770 2750 0
Loop List v6.00TB 2704 2684 0
Fruit v2.2.1 2799 2778 -1
Wildcat v5.0 2585 2563 -2
Jonny v2.82 2618 2595 -3
Ruffian v2.1.0 2675 2648 -7
Glaurung Mainz 2620 2592 -8
Delfi v4.5 2595 2567 -8
Toga II v1.1a 2795 2766 -9
Gandalf v6.0WB 2691 2661 -10
Baron v1.6.1 2557 2527 -10
Tao v5.6 2519 2488 -11
List v5.12 2666 2634 -12
Ktulu v5.1 2615 2581 -14
Spike v0.9 2648 2613 -15
Toga II v1.0 2772 2734 -18
Pro Deo v1.1 2690 2643 -27
Fruit v2.0 2665 2614 -31
Chess Tiger 15.0 normal 2722 2670 -32
Yace v0.99.87 2575 2518 -37
Hiarcs X50 UCI 2844 2784 -40
SmarThink v0.17a 2609 2546 -43
Pepito v1.59 profile 2549 2486 -43
Crafty v20.01BH 2507 2372 -115