Winboard Forum

by **Volker Pittlik** » 03 Jul 2007, 18:25

In the last weeks I played some bullet games and noticed how the error margin was decreasing with number of games played.

The calculations were done with "BayesElo" by Rémi Coulom . In general I got this:

Although I tend to agree with "the more, the merrier" it seems to me that there is a good compromise somewhere. Of course this depend on what should be tested.

Regards

Volker

by **Pradu** » 03 Jul 2007, 20:34

Volker Pittlik wrote:In the last weeks I played some bullet games and noticed how the error margin was decreasing with number of games played.

The calculations were done with "BayesElo" by Rémi Coulom . In general I got this:

Although I tend to agree with "the more, the merrier" it seems to me that there is a good compromise somewhere. Of course this depend on what should be tested.

Regards

Volker

Looks like an exponential decay. Can you give the datapoints so that we can do an approximate interpolation? Perhaps a good estimate for the number of games needed to be played would be 4/(-exponent_coefficient). This should put the error within 2% of the final steady state error (hopefully 0).

Aslo, the number of games needed to be played might vary by engine rating, and average opponent rating.

by **Volker Pittlik** » 03 Jul 2007, 20:57

Pradu wrote:... Can you give the datapoints so that we can do an approximate interpolation?...

These?:

Code: Select all: Games played Error range 22 234 44 174 66 142 88 122 110 109 132 100 154 93 176 87 198 82 220 78 242 74 264 71 286 68 308 65 330 63 352 61 374 59 396 58 418 56 440 55 462 53 484 52 506 51 528 50 550 49 572 48 594 47 616 47 638 45 660 45 682 44 704 43 726 43 748 42 770 41 792 41 814 41

BTW: The engines were:

Fruit (Toga) 1.2.1a
Glaurung 2-epsilon/4 perf
Spike 1.2 Turin
Ruffian 2.1.0
Scorpio 1.91
Shredder Classic 1.3
Jonny 2.83
Yace Paderborn
Crafty-21.5
Zappa 1.1
Arasan 9.5
Hermann 2.0

which finished in the above order.

Volker

by **Pradu** » 03 Jul 2007, 21:15

It seems to follow a power series best:

ELOerror = 1119.5/games^0.4956
ELOerror_approx = 1100/sqrt(#games)

So for different rating ranges it'd be
A/sqrt(#games)

So to get within 2% error of your error for playing 1 game, A, for your rating range, you'd need 2500 games.

by **Volker Pittlik** » 03 Jul 2007, 21:36

Pradu wrote:...you'd need 2500 games.

Maybe it's time to change to "Game in one second" matches...

Volker

by **H.G.Muller** » 05 Jul 2007, 12:39

You dont really need BayesElo for that; the expression for the standard error is quite simple and follows directly from statistical theory. For 30% draw percentage the error in the score percentage between approximately equal opponents over N games is

42%/sqrt(N)

(This is not very sensitive to the draw percentage; for 0% draws it would be 0.5/sqrt(N). )

As each percent score difference gives you 7 Elo (around 50%), the Elo error is 7 times larger, or

294/sqrt(N).

This is the standard error in the rating of a single engine, as a function of the number of games played by that engine. Not the total of the tournament.

In another thread I posted by which factor you have to reduce the value of games played between engines that have a sizable rating difference, when counting the total number of games. In general, error margins in tournaments with wildly different participants will be larger than those where all participants are similar, for the same number of games. Because many games then really do not give you any information at all, and after proper weighting hardly contribute to the effective number of games played.

by **Volker Pittlik** » 05 Jul 2007, 16:51

H.G.Muller wrote:You dont really need BayesElo for that; the expression for the standard error is quite simple and follows directly from statistical theory...

Good to know that my results were aware the theory.

H.G.Muller wrote:...[more theory snipped]

... In general, error margins in tournaments with wildly different participants will be larger than those where all participants are similar, for the same number of games. Because many games then really do not give you any information at all...

Well I didn't want to disprove that. I've chosen engines which

a) run flawless at bullet time controls and
b) get a score of at least 20% in the testing group, so that even the last one gets a half or whole point from no. 1 from time to time.

I'm going to use that results as a base for comparision for other tests. First results for different Polyglot and Glaurung books can be expected tomorrow.

Regards

Volker

by **Pradu** » 05 Jul 2007, 17:14

H.G.Muller wrote:You dont really need BayesElo for that

Only for people who don't know statistical theory :mrgreen:

; the expression for the standard error is quite simple and follows directly from statistical theory. For 30% draw percentage the error in the score percentage between approximately equal opponents over N games is

42%/sqrt(N)

Great to see it's the same functional form that the experimental data gives.

(This is not very sensitive to the draw percentage; for 0% draws it would be 0.5/sqrt(N). )

As each percent score difference gives you 7 Elo (around 50%), the Elo error is 7 times larger, or

294/sqrt(N).

This is the standard error in the rating of a single engine, as a function of the number of games played by that engine. Not the total of the tournament.

In another thread I posted by which factor you have to reduce the value of games played between engines that have a sizable rating difference, when counting the total number of games. In general, error margins in tournaments with wildly different participants will be larger than those where all participants are similar, for the same number of games. Because many games then really do not give you any information at all, and after proper weighting hardly contribute to the effective number of games played.

Winboard Forum

Error margin and number of games

Error margin and number of games

Re: Error margin and number of games

Re: Error margin and number of games

Re: Error margin and number of games

Re: Error margin and number of games

Re: Error margin and number of games

Re: Error margin and number of games

Re: Error margin and number of games

Who is online