Error margin and number of games

Discussions about Winboard/Xboard. News about engines or programs to use with these GUIs (e.g. tournament managers or adapters) belong in this sub forum.

Moderator: Andres Valverde

Error margin and number of games

Postby Volker Pittlik » 03 Jul 2007, 18:25

In the last weeks I played some bullet games and noticed how the error margin was decreasing with number of games played.

The calculations were done with "BayesElo" by Rémi Coulom . In general I got this:

Image

Although I tend to agree with "the more, the merrier" it seems to me that there is a good compromise somewhere. Of course this depend on what should be tested.

Regards

Volker
User avatar
Volker Pittlik
 
Posts: 1031
Joined: 24 Sep 2004, 10:14
Location: Murten / Morat, Switzerland

Re: Error margin and number of games

Postby Pradu » 03 Jul 2007, 20:34

Volker Pittlik wrote:In the last weeks I played some bullet games and noticed how the error margin was decreasing with number of games played.

The calculations were done with "BayesElo" by Rémi Coulom . In general I got this:

Image

Although I tend to agree with "the more, the merrier" it seems to me that there is a good compromise somewhere. Of course this depend on what should be tested.

Regards

Volker
Looks like an exponential decay. Can you give the datapoints so that we can do an approximate interpolation? Perhaps a good estimate for the number of games needed to be played would be 4/(-exponent_coefficient). This should put the error within 2% of the final steady state error (hopefully 0).

Aslo, the number of games needed to be played might vary by engine rating, and average opponent rating.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: Error margin and number of games

Postby Volker Pittlik » 03 Jul 2007, 20:57

Pradu wrote:... Can you give the datapoints so that we can do an approximate interpolation?...


These?:

Code: Select all
Games played    Error range
 22                 234
 44                 174
 66                 142
 88                 122
110                 109
132                 100
154                  93
176                  87
198                  82
220                  78
242                  74
264                  71
286                  68
308                  65
330                  63
352                  61
374                  59
396                  58
418                  56
440                  55
462                  53
484                  52
506                  51
528                  50
550                  49
572                  48
594                  47
616                  47
638                  45
660                  45
682                  44
704                  43
726                  43
748                  42
770                  41
792                  41
814                  41


BTW: The engines were:

Fruit (Toga) 1.2.1a
Glaurung 2-epsilon/4 perf
Spike 1.2 Turin
Ruffian 2.1.0
Scorpio 1.91
Shredder Classic 1.3
Jonny 2.83
Yace Paderborn
Crafty-21.5
Zappa 1.1
Arasan 9.5
Hermann 2.0

which finished in the above order.

Volker
User avatar
Volker Pittlik
 
Posts: 1031
Joined: 24 Sep 2004, 10:14
Location: Murten / Morat, Switzerland

Re: Error margin and number of games

Postby Pradu » 03 Jul 2007, 21:15

It seems to follow a power series best:

ELOerror = 1119.5/games^0.4956
ELOerror_approx = 1100/sqrt(#games)

So for different rating ranges it'd be
A/sqrt(#games)

So to get within 2% error of your error for playing 1 game, A, for your rating range, you'd need 2500 games.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: Error margin and number of games

Postby Volker Pittlik » 03 Jul 2007, 21:36

Pradu wrote:...you'd need 2500 games.


Maybe it's time to change to "Game in one second" matches...

Volker
User avatar
Volker Pittlik
 
Posts: 1031
Joined: 24 Sep 2004, 10:14
Location: Murten / Morat, Switzerland

Re: Error margin and number of games

Postby H.G.Muller » 05 Jul 2007, 12:39

You dont really need BayesElo for that; the expression for the standard error is quite simple and follows directly from statistical theory. For 30% draw percentage the error in the score percentage between approximately equal opponents over N games is

42%/sqrt(N)

(This is not very sensitive to the draw percentage; for 0% draws it would be 0.5/sqrt(N). )

As each percent score difference gives you 7 Elo (around 50%), the Elo error is 7 times larger, or

294/sqrt(N).

This is the standard error in the rating of a single engine, as a function of the number of games played by that engine. Not the total of the tournament.

In another thread I posted by which factor you have to reduce the value of games played between engines that have a sizable rating difference, when counting the total number of games. In general, error margins in tournaments with wildly different participants will be larger than those where all participants are similar, for the same number of games. Because many games then really do not give you any information at all, and after proper weighting hardly contribute to the effective number of games played.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Error margin and number of games

Postby Volker Pittlik » 05 Jul 2007, 16:51

H.G.Muller wrote:You dont really need BayesElo for that; the expression for the standard error is quite simple and follows directly from statistical theory...


Good to know that my results were aware the theory.

H.G.Muller wrote:...[more theory snipped]

... In general, error margins in tournaments with wildly different participants will be larger than those where all participants are similar, for the same number of games. Because many games then really do not give you any information at all...


Well I didn't want to disprove that. I've chosen engines which

a) run flawless at bullet time controls and
b) get a score of at least 20% in the testing group, so that even the last one gets a half or whole point from no. 1 from time to time.

I'm going to use that results as a base for comparision for other tests. First results for different Polyglot and Glaurung books can be expected tomorrow.

Regards

Volker
User avatar
Volker Pittlik
 
Posts: 1031
Joined: 24 Sep 2004, 10:14
Location: Murten / Morat, Switzerland

Re: Error margin and number of games

Postby Pradu » 05 Jul 2007, 17:14

H.G.Muller wrote:You dont really need BayesElo for that
Only for people who don't know statistical theory :mrgreen:
; the expression for the standard error is quite simple and follows directly from statistical theory. For 30% draw percentage the error in the score percentage between approximately equal opponents over N games is

42%/sqrt(N)
Great to see it's the same functional form that the experimental data gives.

(This is not very sensitive to the draw percentage; for 0% draws it would be 0.5/sqrt(N). )

As each percent score difference gives you 7 Elo (around 50%), the Elo error is 7 times larger, or

294/sqrt(N).

This is the standard error in the rating of a single engine, as a function of the number of games played by that engine. Not the total of the tournament.

In another thread I posted by which factor you have to reduce the value of games played between engines that have a sizable rating difference, when counting the total number of games. In general, error margins in tournaments with wildly different participants will be larger than those where all participants are similar, for the same number of games. Because many games then really do not give you any information at all, and after proper weighting hardly contribute to the effective number of games played.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA


Return to Winboard and related Topics

Who is online

Users browsing this forum: No registered users and 31 guests