How do Blitz ratings compare to longer time controls?

Discussions about Winboard/Xboard. News about engines or programs to use with these GUIs (e.g. tournament managers or adapters) belong in this sub forum.

Moderator: Andres Valverde

How do Blitz ratings compare to longer time controls?

Postby Robert Allgeuer » 22 Feb 2005, 22:31

Certainly some things could be improved in W. Eigenmann?s database, such as eliminating games on obsolete hardware etc. Nevertheless this database is a great and unique tool to determine information, that otherwise would be difficult to obtain.

For the following I have used a slightly extended database (added some recent games by K. Utzinger) with duplicate games removed; I refer to this database by Eigenmann+.

By comparing the results and statistics of my Blitz rating list YABRL (50000 games) and Eigenmann+ (140000 games at longer time controls) I want to shed some light on following questions:

1) How much do Blitz ratings differ from ratings based on longer time controls? Is Blitz just CPU-intensive gambling or is there more value to it?

2) In how far do game statistics differ between Blitz and longer games?

3) Which engines get stronger at long time controls, which ones are balanced and which ones lose strength with increasing time controls? What is the order of rating difference we are talking about here?


For doing so I have calculated ratings for YABRL and the Eigenmann+ database with EloStat 1.3 - both with Ruffian 1.0.1 set to 2650 as reference point - and compared the results. Ruffian 1.0.1 was chosen, because it is - as we will see - a rather balanced engine.


What I found is:

1) Distributions of draws, white and black performance and also length of games (see percentages indicated in the lines "Games :" below) are very similar. I interpret this as a sign that there is a correlation between the two and that Blitz results are not just random. If Blitz results were more random than results at long time controls I would expect a higher percentage of shorter games in Blitz and distributions and percentages that differ more.

2) Generally ratings from YABRL and Eigenmann+ match very well (for exact figures see the engine comparison table below). For the majority of engines the ratings differ only by 30 or less points. For comparison: with 300 games error margins in a rating list are around +/- 30 points.

3) There are of course some engines where the differences of ratings are larger; the maximum observed is +/- 56 points. However, when looking at these engines we see some well known cases: it is known that SoS, Aristarch, Rebel and Comet for example are indeed engines that are stronger at long time controls. Likewise it is also known e.g. for Pepito, Fruit and Crafty 17.xx that they are stronger in Blitz. Also the fact that Patriot gets weaker at longer time controls can be observed in recent games in AEGT. Therefore I am pretty much convinced that these results are not just a statistical effect.

4) Which specific engine came out more as a Blitz expert or a long timer can be seen in the comparison list below (positive values mean stronger at long time controls).

5) Striking is the observation that MTD(f) based engines (two SoS versions, three AnMon versions and PostModernist) are over-represented amongst the long timers. Possibly this is a property of MTD(f).

6) Specific mentioning deserve Yace and Green Light Chess: In both cases recent changes in the engines apparently work only for short time controls, essentially turning the once long timers into Blitz experts without overall increase of playing strength.

7) Generally my conclusion is that a Blitz rating gives a surprisingly good estimation of an engine?s strength, in particular when by some more focused extra tests (e.g. matches against a set of balanced reference engines at different time controls) it is also determined whether an engine is a Blitz expert, balanced or long time expert. In any case Blitz results are definitely not random results.


Statistics of YABRL (5 min + 2 sec):

Code: Select all
>= 20 moves:
Games        :  49195 (finished) (100%)

White Wins   :  20164 (41.0 %)
Black Wins   :  16656 (33.9 %)
Draws        :  12375 (25.2 %)
Unfinished   :      0

White Perf.  : 53.6 %
Black Perf.  : 46.4 %

>= 30 moves:
Games        :  48157 (finished) (97.9%)

White Wins   :  19651 (40.8 %)
Black Wins   :  16443 (34.1 %)
Draws        :  12063 (25.0 %)
Unfinished   :      0

White Perf.  : 53.3 %
Black Perf.  : 46.7 %

>= 40 moves:
Games        :  45070 (finished) (91.6%)

White Wins   :  17984 (39.9 %)
Black Wins   :  15479 (34.3 %)
Draws        :  11607 (25.8 %)
Unfinished   :      0

White Perf.  : 52.8 %
Black Perf.  : 47.2 %

>= 50 moves:
Games        :  39042 (finished) (79.4%)

White Wins   :  15008 (38.4 %)
Black Wins   :  13277 (34.0 %)
Draws        :  10757 (27.6 %)
Unfinished   :      0

White Perf.  : 52.2 %
Black Perf.  : 47.8 %

>= 60 moves:
Games        :  30532 (finished) (62.1%)

White Wins   :  11016 (36.1 %)
Black Wins   :   9971 (32.7 %)
Draws        :   9545 (31.3 %)
Unfinished   :      0

White Perf.  : 51.7 %
Black Perf.  : 48.3 %

>= 70 moves:
Games        :  21344 (finished) (43.4%)

White Wins   :   6940 (32.5 %)
Black Wins   :   6340 (29.7 %)
Draws        :   8064 (37.8 %)
Unfinished   :      0

White Perf.  : 51.4 %
Black Perf.  : 48.6 %

>= 80 moves:
Games        :  14151 (finished) (28.8%)

White Wins   :   3952 (27.9 %)
Black Wins   :   3589 (25.4 %)
Draws        :   6610 (46.7 %)
Unfinished   :      0

White Perf.  : 51.3 %
Black Perf.  : 48.7 %

>= 90 moves:
Games        :   9410 (finished) (19.1%)

White Wins   :   2092 (22.2 %)
Black Wins   :   1894 (20.1 %)
Draws        :   5424 (57.6 %)
Unfinished   :      0

White Perf.  : 51.1 %
Black Perf.  : 48.9 %

>= 100 moves:
Games        :   6501 (finished) (13.2%)

White Wins   :   1083 (16.7 %)
Black Wins   :   1004 (15.4 %)
Draws        :   4414 (67.9 %)
Unfinished   :      0

White Perf.  : 50.6 %
Black Perf.  : 49.4 %



Statistics of Eigenmann+ (>30 min):

Code: Select all
>= 20 moves:
Games        : 117335 (finished) (100%)

White Wins   :  47181 (40.2 %)
Black Wins   :  37596 (32.0 %)
Draws        :  32558 (27.7 %)
Unfinished   :      0

White Perf.  : 54.1 %
Black Perf.  : 45.9 %

>= 30 moves:
Games        : 113895 (finished) (97.1%)

White Wins   :  45465 (39.9 %)
Black Wins   :  36628 (32.2 %)
Draws        :  31802 (27.9 %)
Unfinished   :      0

White Perf.  : 53.9 %
Black Perf.  : 46.1 %

>= 40 moves:
Games        : 104506 (finished) (89.1%)

White Wins   :  40650 (38.9 %)
Black Wins   :  33465 (32.0 %)
Draws        :  30391 (29.1 %)
Unfinished   :      0

White Perf.  : 53.4 %
Black Perf.  : 46.6 %

>= 50 moves:
Games        :  87758 (finished) (74.8%)

White Wins   :  32460 (37.0 %)
Black Wins   :  27436 (31.3 %)
Draws        :  27862 (31.7 %)
Unfinished   :      0

White Perf.  : 52.9 %
Black Perf.  : 47.1 %

>= 60 moves:
Games        :  66066 (finished) (56.3%)

White Wins   :  22541 (34.1 %)
Black Wins   :  19600 (29.7 %)
Draws        :  23925 (36.2 %)
Unfinished   :      0

White Perf.  : 52.2 %
Black Perf.  : 47.8 %

>= 70 moves:
Games        :  45520 (finished) (38,8%)

White Wins   :  13800 (30.3 %)
Black Wins   :  12271 (27.0 %)
Draws        :  19449 (42.7 %)
Unfinished   :      0

White Perf.  : 51.7 %
Black Perf.  : 48.3 %

>= 80 moves:
Games        :  30636 (finished) (26.1%)

White Wins   :   7897 (25.8 %)
Black Wins   :   7098 (23.2 %)
Draws        :  15641 (51.1 %)
Unfinished   :      0

White Perf.  : 51.3 %
Black Perf.  : 48.7 %

>= 90 moves:
Games        :  20846 (finished) (17.8%)

White Wins   :   4379 (21.0 %)
Black Wins   :   3949 (18.9 %)
Draws        :  12518 (60.0 %)
Unfinished   :      0

White Perf.  : 51.0 %
Black Perf.  : 49.0 %

>= 100 moves:
Games        :  14841 (finished) (12.6%)

White Wins   :   2498 (16.8 %)
Black Wins   :   2245 (15.1 %)
Draws        :  10098 (68.0 %)
Unfinished   :      0

White Perf.  : 50.9 %
Black Perf.  : 49.1 %



Rating differences between YABRL and Eigenmann+:

Code: Select all
Engine                        YABRL        Eigenmann+   Delta
Arasan v7.4                   2396         2452         56   
Comet B60                     2436         2491         55   
SoS 4                         2559         2610         51   
Aristarch v4.21               2582         2631         49   
SoS 3                         2564         2603         39   
Anmon v5.51                   2546         2584         38   
Leila v0.53h                  2425         2463         38   
Quark v2.35                   2501         2536         35   
PostModernist v1.007          2442         2477         35   
Aristarch v4.50               2614         2644         30   
Tao v5.6                      2523         2552         29   
Anmon v5.30                   2530         2558         28   
Dragon v4.4.3                 2466         2494         28   
Tcb v0045                     2412         2439         27   
Pharaon v2.62                 2511         2537         26   
Rebel v12.00.01               2619         2641         22   
Yace v0.99.56                 2544         2566         22   
Green Light Chess v3.00       2536         2554         18   
Anmon v5.22                   2485         2503         18   
Little Goliath 2000 v3.5      2539         2556         17   
SlowChess v2.89b              2488         2504         16   
DeepSjeng v1.6                2623         2638         15   
Tao v5.4                      2477         2492         15   
Ruffian v2.0.2                2674         2687         13   
Delfi v4.5                    2600         2613         13   
SmarThink v0.16b++            2560         2573         13   
LambChop v10.99               2497         2507         10   
Gromit v3.8.2                 2499         2508         9     
Gandalf v6.0WB                2692         2700         8     
Crafty v18.15DC               2558         2566         8     
Francesca M.0.0.9             2453         2459         6     
Ruffian v2.0.0                2675         2680         5     
SmarThink v0.17a              2603         2608         5     
El Chinito v3.25              2568         2572         4     
Gothmog v1.0 beta 10          2561         2563         2     
Yace Paderborn                2555         2557         2     
Ktulu v4.2                    2587         2588         1     
Ruffian v2.1.0                2679         2679         0     
Ruffian v1.0.1                2650         2650         0     
Amy v0.8.3                    2475         2472         -3   
Exchess v4.03                 2336         2333         -3   
SoS v11-99                    2479         2468         -11   
Resp v0.19                    2402         2391         -11   
Yace v0.99.87                 2579         2565         -14   
Little Goliath 2000 v3.9      2560         2546         -14   
KnightDreamer v3.2            2475         2461         -14   
List v5.12                    2668         2652         -16   
Pepito v1.59 profile          2551         2530         -21   
Thinker v4.6b                 2612         2586         -26   
Wildcat v4.0                  2565         2538         -27   
Amyan v1.59                   2512         2485         -27   
Fruit v2.0                    2668         2640         -28   
Crafty v17.14DC               2586         2557         -29   
Chess Tiger 2004 normal       2716         2685         -31   
Fruit v1.5                    2551         2519         -32   
Green Light Chess v3.0.3.4    2544         2512         -32   
Chess Tiger 15.0 normal       2723         2687         -36   
Patriot v1.2.3                2595         2539         -56



Robert
Robert Allgeuer
 
Posts: 124
Joined: 28 Sep 2004, 19:09
Location: Konz / Germany

Re: How do Blitz ratings compare to longer time controls?

Postby Tim Foden » 24 Feb 2005, 15:13

Robert Allgeuer wrote:6) Specific mentioning deserve Yace and Green Light Chess: In both cases recent changes in the engines apparently work only for short time controls, essentially turning the once long timers into Blitz experts without overall increase of playing strength.

Code: Select all
Green Light Chess v3.00       2536         2554         18   
Green Light Chess v3.0.3.4    2544         2512         -32   


Robert


Hi Robert,

I find this analysis to be interesting. It's a shame there isn't data for 3.01.2.2 though. I'm not really that surprised about 3.00.3.4, as it was changes made whilst at Graz to a version which I already knew to be weaker than 3.00! (I took the wrong source by mistake) :)

Even so, some of them (the changes) must have been OK.

Cheers, Tim.
Tim Foden
 

Re: How do Blitz ratings compare to longer time controls?

Postby Robert Allgeuer » 24 Feb 2005, 18:30

Hi Tim,
it would be interesting what kind of changes were made between 3.00 and 3.0.3.4, for the "academic" value. Because it seems that these changes were primarily ones that were ok at short time controls, but worked not that well at longer time controls.

Robert
Robert Allgeuer
 
Posts: 124
Joined: 28 Sep 2004, 19:09
Location: Konz / Germany

Re: How do Blitz ratings compare to longer time controls?

Postby Tim Foden » 25 Feb 2005, 09:17

Robert Allgeuer wrote:Hi Tim,
it would be interesting what kind of changes were made between 3.00 and 3.0.3.4, for the "academic" value. Because it seems that these changes were primarily ones that were ok at short time controls, but worked not that well at longer time controls.

Robert


Hi Robert,

At the moment I'm afraid I can't tell, as the machine that hosts my CVS repository is not working, and even if it was working, I don't have any space to put it in! :) Maybe I'll just put the drive from it into my machine, as I am beginning to be despirate to be able to check in and compare changes now.

Cheer's Tim.
Tim Foden
 

Re: How do Blitz ratings compare to longer time controls?

Postby Robert Allgeuer » 25 Feb 2005, 12:12

I guess losing access to the CVS archive is an issue .... :(
Robert Allgeuer
 
Posts: 124
Joined: 28 Sep 2004, 19:09
Location: Konz / Germany


Return to Winboard and related Topics

Who is online

Users browsing this forum: No registered users and 54 guests