Comparison of Blitz and 40/40 performance (YABRL and CEGT)

Discussions about Winboard/Xboard. News about engines or programs to use with these GUIs (e.g. tournament managers or adapters) belong in this sub forum.

Moderator: Andres Valverde

Comparison of Blitz and 40/40 performance (YABRL and CEGT)

Postby Robert Allgeuer » 03 Jul 2006, 22:47

This is a comparison of ratings as measured in CEGT (40/40min) and YABRL (5min+2sec).

My interest for doing this comparison is twofold:

1) How do Blitz ratings generally correlate with ratings at longer time controls? Is testing at Blitz controls just "gambling" or are such results a meaningful measurement of playing strength?

2) With the hardware of YABRL becoming increasingly outdated (Athlon 1.1GHz), are YABRL results still useful?

The reference point I chose is Shredder 9 (in YABRL 2770, in CEGT 2750), as it is a quite balanced engine. In the list below positive "Delta" values for a given engine mean that that engine gains strength at longer time controls compared to Shredder 9, while negative values indicate a loss of playing strength. Hence, the larger negative values are, the more an engine is a "Blitzer", while large positive values indicate an expert for long time controls.

There are some uncertainties in these results:

1) Error margins still apply: some engines (Crafty 20.01, Chesstiger 2004, Tao 5.6, Smarthink 0.17a ...) have less than 100 games in CEGT.

2) CEGT has a wider variation of average opponent ELO than YABRL, which introduces an additional systematic error.

3) CEGT combines sometimes similar engine versions into one single rating (e.g. Fruit 2.2.x, Loop List 600 with and without EGTBs, Jonny and others).

Nevertheless the results correlate remarkably well: the bulk of the engines are within let?s say +/- 25 ELO points.

I therefore conclude:

1) Blitz testing obviously has a strong correlation with "real" playing strength at longer time controls.

2) YABRL results - even though Blitz on weak hardware - are meaningful.

3) Comparisons like the one below allow to identify "Blitz-experts" and "Long timers" among the engines.

Blitzers include Fruit 2.0, Pepito, Hiarcs X50, Yace, ProDeo 1.1 and Chesstiger 15 (Smarthink 0.17a, Crafty 20.1 and ChessTiger 2004 do not have enough games in CEGT, so the result below is most probably inaccurate).

Long timers include Shredder 10, Ktulu 7.0, SoS and Pharaon.

Rybka (again) is the odd one out and appears to be significantly better at long time controls, much more so than any other engine.

Robert



Code: Select all
Program                       YABRL     CEGT      Delta
                                                           
Rybka v1.2f                   2878      2939      81       
Rybka v1.1                    2871      2900      49       
Ktulu v7.0                    2639      2661      42       
Shredder 10 UCI sb345         2809      2822      33       
SoS 5.1                       2602      2613      31       
Pharaon v3.3                  2585      2595      30       
Ufim v7.00                    2510      2515      25       
Chess Tiger 2004 normal       2716      2720      24       
Delfi v4.6                    2600      2603      23       
Spike v1.1                    2720      2722      22       
Toga II v1.2.1                2810      2811      21       
Amyan v1.595                  2509      2509      20       
Scorpio v1.7                  2642      2641      19       
SlowChess Blitz WV2.1         2660      2658      18       
Pseudo v0.7c                  2609      2605      16       
Aristarch v4.50               2611      2606      15       
Zappa v1.1                    2607      2602      15       
SmarThink v1.00WB             2706      2700      14       
Ktulu v7.5                    2702      2696      14       
SlowChess Blitz WV2           2643      2636      13       
DeepSjeng v1.6ntb             2612      2605      13       
Green Light Chess v3.01.2.2   2550      2543      13       
Fruit v2.1                    2725      2717      12       
Anmon v5.60                   2559      2550      11       
Spike v1.0a Mainz             2681      2671      10       
Zappa v1.0                    2584      2573      9         
Glaurung v1.0.1               2671      2658      7         
SlowChess Blitz WV            2613      2600      7         
Naum v1.91                    2653      2636      3         
Rybka v1.0 Beta               2838      2820      2         
Ruffian v1.0.1                2649      2631      2         
Wildcat v6.0                  2633      2614      1         
Shredder 9 UCI dcbk           2770      2750      0         
Loop List v6.00TB             2704      2684      0         
Fruit v2.2.1                  2799      2778      -1       
Wildcat v5.0                  2585      2563      -2       
Jonny v2.82                   2618      2595      -3       
Ruffian v2.1.0                2675      2648      -7       
Glaurung Mainz                2620      2592      -8       
Delfi v4.5                    2595      2567      -8       
Toga II v1.1a                 2795      2766      -9       
Gandalf v6.0WB                2691      2661      -10       
Baron v1.6.1                  2557      2527      -10       
Tao v5.6                      2519      2488      -11       
List v5.12                    2666      2634      -12       
Ktulu v5.1                    2615      2581      -14       
Spike v0.9                    2648      2613      -15       
Toga II v1.0                  2772      2734      -18       
Pro Deo v1.1                  2690      2643      -27       
Fruit v2.0                    2665      2614      -31       
Chess Tiger 15.0 normal       2722      2670      -32       
Yace v0.99.87                 2575      2518      -37       
Hiarcs X50 UCI                2844      2784      -40       
SmarThink v0.17a              2609      2546      -43       
Pepito v1.59 profile          2549      2486      -43       
Crafty v20.01BH               2507      2372      -115     
Robert Allgeuer
 
Posts: 124
Joined: 28 Sep 2004, 19:09
Location: Konz / Germany

Re: Comparison of Blitz and 40/40 performance (YABRL and CEG

Postby Heinz van Kempen » 05 Jul 2006, 19:33

Hi Robert :) ,

thanks for this interesting comparison.

Up to now I thought there would be considerable improvements for Zap!Chess, Junior and Gandalf with more time.

With this numbers provided by you we will have more clues now, especially regarding Rybka and Shredder.

Hiarcs is really a good Blitzer.
Heinz van Kempen
 
Posts: 160
Joined: 27 Sep 2004, 07:35
Location: Leverkusen, Germany

Re: Comparison of Blitz and 40/40 performance (YABRL and CEG

Postby GenoM » 06 Jul 2006, 10:26

in this connection -- my question is: are the 30+5 games on my Celeron 433 meaningful?

I am really interreseted in your answer.

thanks,
Geno
we live in a beautiful world
GenoM
 
Posts: 36
Joined: 27 Dec 2004, 02:49
Location: Bulgaria

Re: Comparison of Blitz and 40/40 performance (YABRL and CEG

Postby Robert Allgeuer » 06 Jul 2006, 19:33

GenoM wrote:in this connection -- my question is: are the 30+5 games on my Celeron 433 meaningful?

I am really interreseted in your answer.

thanks,
Geno


In my view, yes they are.
Possibly this Celeron has a third of the speed of the YABRL machine for example, so your 30+5 would be roughly equivalent to a 10+1.6 on this machine, hence even less "blitzy" than YABRL. Also compared to more recent hardware, which is again 3 times faster, your effective time control would come down to 3+0.5, certainly still meaningful, as there is a strong correlation between test results at long and short time controls.
This of course does not protect you from comments that may say "your testing on such hardware is meaningless", just like there are also posts that consider - without any factual foundation - Blitz testing useless or gambling.
This being said, best is of course to test with large numbers of games at long time controls, but such testing is a major effort (CEGT, CCRL etc.) and not everybody has the means for doing that. But the good news is that also smaller efforts give good results. Also knowing performance at both ends of the scale gives us a more complete picture about the overall behaviour of an engine.

Robert
Robert Allgeuer
 
Posts: 124
Joined: 28 Sep 2004, 19:09
Location: Konz / Germany

Re: Comparison of Blitz and 40/40 performance (YABRL and CEG

Postby GenoM » 07 Jul 2006, 00:04

thanks for the answer Robert

your post confirms exactly what I was thinking about testing on old hardware

regards,
Genom

ps: btw, Rybka (free beta) is not so strong on my PC - ToGa is the strongest engine; that is why I agree with your observation about engines that make more profit from longer time controls than other
we live in a beautiful world
GenoM
 
Posts: 36
Joined: 27 Dec 2004, 02:49
Location: Bulgaria

Re: Comparison of Blitz and 40/40 performance (YABRL and CEG

Postby Dann Corbit » 07 Jul 2006, 00:38

I don't understand your delta. By straight subtraction, I get this:
Code: Select all
Program                       YABRL     CEGT      Delt
                                                     
Rybka v1.2f                   2878      2939      61 
Rybka v1.1                    2871      2900      29 
Ktulu v7.0                    2639      2661      22 
Shredder 10 UCI sb345         2809      2822      13 
SoS 5.1                       2602      2613      11 
Pharaon v3.3                  2585      2595      10 
Ufim v7.00                    2510      2515      5   
Chess Tiger 2004 normal       2716      2720      4   
Delfi v4.6                    2600      2603      3   
Spike v1.1                    2720      2722      2   
Toga II v1.2.1                2810      2811      1   
Amyan v1.595                  2509      2509      0   
Scorpio v1.7                  2642      2641      -1 
SlowChess Blitz WV2.1         2660      2658      -2 
Pseudo v0.7c                  2609      2605      -4 
Aristarch v4.50               2611      2606      -5 
Zappa v1.1                    2607      2602      -5 
SmarThink v1.00WB             2706      2700      -6 
Ktulu v7.5                    2702      2696      -6 
SlowChess Blitz WV2           2643      2636      -7 
DeepSjeng v1.6ntb             2612      2605      -7 
Green Light Chess v3.01.2.2   2550      2543      -7 
Fruit v2.1                    2725      2717      -8 
Anmon v5.60                   2559      2550      -9 
Spike v1.0a Mainz             2681      2671      -10
Zappa v1.0                    2584      2573      -11
Glaurung v1.0.1               2671      2658      -13
SlowChess Blitz WV            2613      2600      -13
Naum v1.91                    2653      2636      -17
Rybka v1.0 Beta               2838      2820      -18
Ruffian v1.0.1                2649      2631      -18
Wildcat v6.0                  2633      2614      -19
Shredder 9 UCI dcbk           2770      2750      -20
Loop List v6.00TB             2704      2684      -20
Fruit v2.2.1                  2799      2778      -21
Wildcat v5.0                  2585      2563      -22
Jonny v2.82                   2618      2595      -23
Ruffian v2.1.0                2675      2648      -27
Glaurung Mainz                2620      2592      -28
Delfi v4.5                    2595      2567      -28
Toga II v1.1a                 2795      2766      -29
Gandalf v6.0WB                2691      2661      -30
Baron v1.6.1                  2557      2527      -30
Tao v5.6                      2519      2488      -31
List v5.12                    2666      2634      -32
Ktulu v5.1                    2615      2581      -34
Spike v0.9                    2648      2613      -35
Toga II v1.0                  2772      2734      -38
Pro Deo v1.1                  2690      2643      -47
Fruit v2.0                    2665      2614      -51
Chess Tiger 15.0 normal       2722      2670      -52
Yace v0.99.87                 2575      2518      -57
Hiarcs X50 UCI                2844      2784      -60
SmarThink v0.17a              2609      2546      -63
Pepito v1.59 profile          2549      2486      -63
Crafty v20.01BH               2507      2372      -135


What is the computation for your delta number?
Dann Corbit
 

Re: Comparison of Blitz and 40/40 performance (YABRL and CEG

Postby Robert Allgeuer » 07 Jul 2006, 07:47

Shredder 9 is the reference point, so I compensate for Shredder?s different rating values in the two lists.

The computation is:
Delta = CEGT - YABRL + 20.

Which essentially sets Delta for Shredder 9 to 0 and shifts the delta values up. I chose Shredder 9, because after all observations in various rating lists I think it is a balanced engine. Which is also confirmed by the fact that in the YABRL-CEGT comparison it comes out pretty much in the middle of the pack.

Robert
Robert Allgeuer
 
Posts: 124
Joined: 28 Sep 2004, 19:09
Location: Konz / Germany


Return to Winboard and related Topics

Who is online

Users browsing this forum: Google [Bot] and 50 guests