A Statistical study of chess results

Discussions about Winboard/Xboard. News about engines or programs to use with these GUIs (e.g. tournament managers or adapters) belong in this sub forum.

Moderator: Andres Valverde

Re: A Statistical study of chess results

Postby Norm Pollock » 19 Feb 2005, 01:54

Robert Allgeuer wrote:The Eigenmann database is of course just a collection of whatever games, so none of your conditions will hold.

YABRL, however, is completely different and very controlled. White and black average elo are guaranteed to be completely identical (because each engine plays each side equally often against identical opponents), there are no duplicates etc. The maximum ELO difference between opponents is 400 though. A tool that can filter out those games where the ELO difference is > 200 is probably difficult to find I reckon, though. but it wold be interesting.

Despite the differences in nature of the two databases the figures for Eigenmann?s database and YABRL look so similar, which for me is already interesting to see.

IIRC with the human games database the performance of white did not drop towards 50% even when having ELO differences > 200 (as is the case with the two computer databases, both of which however drop towards 50%. So as a minimum in this respect the human and computer games behave differently).

Robert


Filtering out games where the elo difference is a fixed amount is too difficult. However you can set an elo range of 200 width and filter out games with players not in that range. That's how I got my sample.

How reliable are computer elos? Human elos seem to be reliable.

A 400 elo difference is too much. A 2700 elo player will probably have a 90% score against a 2300 player. Even 200 is too large. I would like to make the max difference about 100 elo. The key thing is to ensure equality of skill so that the only important difference between the players is the color of the pieces.

Another problem is that elo is fluid stat. Elos change constantly and a 2530 elo player one day could be a 2620 elo player some other day.

Another issue with comparing a sample of computer games to human games is the quantity of players. My 2nd sample of 30000 games had over 900 players. It is hard to tell exactly because players get their names spelled differently so many times that 3 players could actually be the same. I also think the max number of games played by any one player was around 300 or 1%.

In your samples, how many different chess engines. What was the max number of games played by any one engine.
Norm Pollock
 
Posts: 217
Joined: 27 Sep 2004, 02:52

Re: A Statistical study of chess results

Postby Robert Allgeuer » 19 Feb 2005, 09:58

I think your "fluid" argument is important, and this distinguishes human from computer ratings. At least in these two computer games databases, where ratings are per version of a program, ratings are constant. 2650 rated Ruffian 1.0.1 does not change his strength over time (learning off in YABRL).

Yes, I could filter out the programs that are between 2450 and 2650. Guess this filtering needs to be based on the names of the engines, so it is a bit laborious ...

A difference in playing strength should have only an impact on the percentages of draws; at least in YABRL average strength of white and black are identical under all circumstances.

Up to 1500 games for a specific program in YABRL, up to 8000 or so I think in Eigenmann?s database. In the YABRL database there are no spelling errors, in Eigenmann?s database possibly, but looking at it I think at most very limited.

In Eigenmann?s database there are about 1000 different chess engines, in the YABRL database there are 85. SO the both databases are very different in nature, yet the results and statistics look surprisingly very similar.

Robert
Robert Allgeuer
 
Posts: 124
Joined: 28 Sep 2004, 19:09
Location: Konz / Germany

Re: A Statistical study of chess results

Postby Norm Pollock » 19 Feb 2005, 16:00

Robert Allgeuer wrote:I think your "fluid" argument is important, and this distinguishes human from computer ratings. At least in these two computer games databases, where ratings are per version of a program, ratings are constant. 2650 rated Ruffian 1.0.1 does not change his strength over time (learning off in YABRL).

Yes, I could filter out the programs that are between 2450 and 2650. Guess this filtering needs to be based on the names of the engines, so it is a bit laborious ...

A difference in playing strength should have only an impact on the percentages of draws; at least in YABRL average strength of white and black are identical under all circumstances.

Up to 1500 games for a specific program in YABRL, up to 8000 or so I think in Eigenmann?s database. In the YABRL database there are no spelling errors, in Eigenmann?s database possibly, but looking at it I think at most very limited.

In Eigenmann?s database there are about 1000 different chess engines, in the YABRL database there are 85. SO the both databases are very different in nature, yet the results and statistics look surprisingly very similar.

Robert


I would like to take a look at these databases. Can you give me the links?

The fact that there are 1500 or 8000 games for a specific program sounds too much based on the size of the database. 1000 different engines is fine, but 85 is too small. These databases are not as random as the databases I used.

How many computer games were decided because of TIME? Did all engines have endgame tablebases?

There are of course major differences between computer play and human play. I'm sure the gap will diminish with time. But from I observe, computers do not recognize drawn situations quickly because one of the engines thinks it is winning, and the game gets lengthier. So a 50 move human game could end up as a 150 move computer game.

And the biggest difference is that humans at the master level will resign with the loss of a rook or less, while the computer game will keep going until there is a greater differential. Some computers do not have an automatic resign feature so their games are very long. And computers do not get tired. So the average length of computer games is bound to be much greater than the average length of human games.

Because of all these inherent differences, it would not be surprising that white/black ratio v length patterns will be different for humans v computers.
Norm Pollock
 
Posts: 217
Joined: 27 Sep 2004, 02:52

Re: A Statistical study of chess results

Postby Robert Allgeuer » 19 Feb 2005, 16:52

Eigenmann?s database can be downloaded from http://www.beepworld.de/members38/eigen ... ammier.htm , the Blitz games are currently not online, possibly on Dann?s ftp site soon.

Robert
Robert Allgeuer
 
Posts: 124
Joined: 28 Sep 2004, 19:09
Location: Konz / Germany

Re: A Statistical study of chess results

Postby Norm Pollock » 19 Feb 2005, 20:14

Thanks to Dann Corbit who wrote the utility for averaging white elo and black elo from the tags in a pgn file. If anyone wants this utility, information is available in the "Programming" thread of this forum.

I decided to redo the database due to some minor irregularities, and while I was at it, I included games from the year "1999". The games are human-human games from Jan 01, 1999 to Dec 26, 2004. There are 70,069 games in the database. The elo "width" is from 2400-2871. If people think it is necessary, I will construct a subset of this database using a smaller elo width. By the way, all games in the database have a result and all have both white and black elo tags. The database was heavily filtered to remove short time control games, Internet games and computer engine games.

Here is my CONCLUSION (the data follows):

I observed that White's Winning Percentage in Non-Drawn Games is independent of the length of the game. The percentages observed are: 60.4% (for games with plies >=41), 59.7% (for games with plies >=61), 59.8% (for games with plies >= 81), 58.5% (for games with plies >= 101), 59.0% (for games with plies >= 121) and 59.0% (for games with plies >= 141).

Plies >= 41
Games: 70,069
Average White Elo: 2526.1
Average Black Elo: 2522.3
Difference in Elo: 3.8
White Wins: 24,861 (35.5%)
Draws: 28,897 (41.2%)
Black Wins: 16,311 (23.3%)
White Score: 56.1%
Black Score: 43.9%
Non-Drawn Games: 41,172
White Win percent in Non-Drawn Games: 60.4%

Plies >= 61
Games: 55,170
Average White Elo: 2525.5
Average Black Elo: 2521.4
Difference in Elo: 4.1
White Wins: 21,018 (38.1%)
Draws: 19,983 (36.2%)
Black Wins: 14,162 (25.7%)
White Score: 56.2%
Black Score: 43.8%
Non-Drawn Games: 35,187
White Win percent in Non-Drawn Games: 59.7%

Plies >= 81
Games: 35,482
Average White Elo: 2525.6
Average Black Elo: 2522.2
Difference in Elo: 3.4
White Wins: 13,160 (37.1%)
Draws: 13,478 (38.0%)
Black Wins: 8,844 (24.9%)
White Score: 56.1%
Black Score: 43.9%
Non-Drawn Games: 22,004
White Win percent in Non-Drawn Games: 59.8%

Plies >= 101
Games: 18,910
Average White Elo: 2525.1
Average Black Elo: 2522.9
Difference in Elo: 2.2
White Wins: 6,776 (35.8%)
Draws: 7,322 (38.7%)
Black Wins: 4,812 (25.4%)
White Score: 55.2%
Black Score: 44.8%
Non-Drawn Games: 11,588
White Win percent in Non-Drawn Games: 58.5%

Plies >= 121
Games: 9,037
Average White Elo: 2526.4
Average Black Elo: 2524.0
Difference in Elo: 2.4
White Wins: 3,117 (34.5%)
Draws: 3,752 (41.5%)
Black Wins: 2,168 (24.0%)
White Score: 55.3%
Black Score: 44.7%
Non-Drawn Games: 5,285
White Win percent in Non-Drawn Games: 59.0%

Plies >= 141
Games: 3,918
Average White Elo: 2526.0
Average Black Elo: 2523.1
Difference in Elo: 2.9
White Wins: 1,279 (32.6%)
Draws: 1,750 (44.7%)
Black Wins: 889 (22.7%)
White Score: 55.0%
Black Score: 45.0%
Non-Drawn Games: 2,168
White Win percent in Non-Drawn Games: 59.0%

[last edited Feb 19, 2005 4:20pm ET]
Norm Pollock
 
Posts: 217
Joined: 27 Sep 2004, 02:52

Re: A Statistical study of chess results

Postby Norm Pollock » 26 Feb 2005, 02:57

I now have the tools I needed to redo this study and do it with a little more rigor.

My database consists of long time-control human-human games, all with elo ratings for both players, both players 2300+, dates of games from Jan 01, 1991 to Dec 26,2004.

The games were carefully filtered as there are only 63,687 of them. The maximum elo-distance in any game is at most 50 elo points. The average elo-distance for each sample is in the data, but roughly it is about 25 elo points. The average elos for white and for black are practically equal.

With this database, the results are slightly different. There is a small trend towards equalization as the games get longer. However white holds a commanding advantage over black, whether it is in games of 41+ plies, or 161+ plies. Over that range, white's winning percentage in non-drawn games declines slightly from 59.7% to 56.7%. The prior database did not detect this trend.

Computer games will show different results than human-human games. Computers do not play for draws, which humans do sometimes especially if they are playing the black pieces. Computers do not get tired and are not intimidated by very long drawn out equal positions. They still go for the win. Anyway, that is for a different study someday when I can find a fair database.

I wish I could paste my excel spreadsheet, but the formatting gets messed up here. So I will state the results methodically:

Plies >= 41
Games: 63687 (100.00%)
Average White Elo: 2478.42
Average Black Elo: 2477.96
Average Elo Distance: 25.38
White Wins: 22228 (34.9%)
Draws: 26441 (41.5%)
Black Wins: 15018 (23.6%)
White Score: 55.7%
Black Score: 44.3%
Non-Drawn Games: 37246
White Win percent in Non-Drawn Games: 59.7%

Plies >= 61
Games: 50383 (79.11%)
Average White Elo: 2476.91
Average Black Elo: 2476.49
Average Elo Distance: 25.35
White Wins: 18888 (37.5%)
Draws: 18441 (36.6%)
Black Wins: 13054 (25.9%)
White Score: 55.8%
Black Score: 44.2%
Non-Drawn Games: 31942
White Win percent in Non-Drawn Games: 59.1%

Plies >= 81
Games: 32476 (50.99%)
Average White Elo: 2477.66
Average Black Elo: 2477.37
Average Elo Distance: 25.41
White Wins: 11728 (36.1%)
Draws: 12522 (38.6%)
Black Wins: 8226 (25.3%)
White Score: 55.4%
Black Score: 44.6%
Non-Drawn Games: 19954
White Win percent in Non-Drawn Games: 58.8%

Plies >= 101
Games: 17198 (27.00%)
Average White Elo: 2477.51
Average Black Elo: 2477.32
Average Elo Distance: 25.41
White Wins: 6063 (35.3%)
Draws: 6735 (39.2%)
Black Wins: 4400 (25.6%)
White Score: 54.8%
Black Score: 45.2%
Non-Drawn Games: 10463
White Win percent in Non-Drawn Games: 57.9%

Plies >= 121
Games: 8158 (12.81%)
Average White Elo: 2479.62
Average Black Elo: 2479.48
Average Elo Distance: 25.38
White Wins: 2774 (34.0%)
Draws: 3366 (41.3%)
Black Wins: 2018 (24.7%)
White Score: 54.6%
Black Score: 45.4%
Non-Drawn Games: 4792
White Win percent in Non-Drawn Games: 57.9%

Plies >= 141
Games: 3484 (5.47%)
Average White Elo: 2477.20
Average Black Elo: 2476.76
Average Elo Distance: 25.20
White Wins: 1121 (32.2%)
Draws: 1522 (43.7%)
Black Wins: 841 (24.1%)
White Score: 54.0%
Black Score: 46.0%
Non-Drawn Games: 1962
White Win percent in Non-Drawn Games: 57.1%

Plies >= 161
Games: 1469 (2.31%)
Average White Elo: 2483.07
Average Black Elo: 2482.44
Average Elo Distance: 25.19
White Wins: 440 (30.0%)
Draws: 693 (47.2%)
Black Wins: 336 (22.9%)
White Score: 53.5%
Black Score: 46.5%
Non-Drawn Games: 776
White Win percent in Non-Drawn Games: 56.7%
Norm Pollock
 
Posts: 217
Joined: 27 Sep 2004, 02:52

Previous

Return to Winboard and related Topics

Who is online

Users browsing this forum: No registered users and 35 guests