Engine Test

Everything what does not fit in the other forums. Chess related or not, trivial or sophisticated, but keep it civilized and respect others please.

Moderator: Andres Valverde

Engine Test

Postby lius28 » 02 May 2013, 03:29

Hello Guys i have an assigment to compare two chess engine. my goal is to compare their strength and speed.. is there a way to do that.
I have tried using elostat calculation on Arena after playing a few rounds of engine A and B against each other. But i dont realy get what the result means.
what is Elo point stand for and how to get the engine's speed.
Thanks b4..
lius28
 
Posts: 3
Joined: 02 May 2013, 02:14

Re: Engine Test

Postby Ron Murawski » 02 May 2013, 06:07

Statistically, an Elo rating after 'a few rounds' means nothing. Many thousands of games must be played to get more trustworthy Elo numbers.

Elo rating system (Wikipedia)
http://en.wikipedia.org/wiki/Elo_rating_system

Speed usually refers to nodes per second (nps), but nps does not always mean the same thing from one program to another. Generally a program with a higher nps will win more often, but some slow searchers are much stronger than their nps numbers indicate.
User avatar
Ron Murawski
 
Posts: 352
Joined: 26 Sep 2004, 21:50
Location: Schenectady, NY, USA

Re: Engine Test

Postby lius28 » 02 May 2013, 06:41

Thanks for ur reply, then is there other way to get their strength and speed point?
maybe a formula to measure their strength, just a few game maybe not enough for a correct result but it is okay for my task.
thanks before...
lius28
 
Posts: 3
Joined: 02 May 2013, 02:14

Re: Engine Test

Postby Ron Murawski » 04 May 2013, 06:17

There is too much uncertainty when using small sets for statistical purposes. ie: If I flip a coin 4 times and it lands heads 3 of them, can I then conclude that heads are 3 times more likely to appear as tails? How many times will I have to flip a coin to 'prove' that the heads/tails odds are equal?

You can read this page from the Chess Programming Wiki
http://chessprogramming.wikispaces.com/Match+Statistics
Elo, Elo difference, and Likelihood of Superiority [LOS] are the most often-used comparison measurements.

The only way to avoid playing many games is looking at what other people have already measured
CCRL ratings
http://www.computerchess.org.uk/ccrl/404/

Here's a little chart for you (I've forgotten where I got it from and I don't know how accurate it is):
Code: Select all
         Confidence
 Score   90%  95%  99%
  55%    170  281  550
  60%     46   71  141
  65%     21   30   64
  70%     14   18   35
  75%      9   13   22
  80%      7   11   17
  85%      7    8   14
  90%      4    5   11
  95%      4    5    7
 100%      4    5    7


'Confidence' is a statistical term. A 90% confidence means that your results will be right 90% of the time and wrong 10% of the time. Notice that, if one engine is much stronger than the other, then very few games are needed.

The table indicates that if one engine scores 55% success then, for 90% confidence you will need to play 170 games. If you want 99% confidence you will need to play 550 games.

I'm not sure what class you are taking or what kind of statistics background you have. Mathematically the shortest possible test for you is to choose the strongest chess engine and compare it to the weakest engine. As soon as your measured Elo difference between the engines is beyond the 99% confidence error bars you're done.
User avatar
Ron Murawski
 
Posts: 352
Joined: 26 Sep 2004, 21:50
Location: Schenectady, NY, USA

Re: Engine Test

Postby lius28 » 05 May 2013, 14:32

Ron Murawski wrote:There is too much uncertainty when using small sets for statistical purposes. ie: If I flip a coin 4 times and it lands heads 3 of them, can I then conclude that heads are 3 times more likely to appear as tails? How many times will I have to flip a coin to 'prove' that the heads/tails odds are equal?

You can read this page from the Chess Programming Wiki
http://chessprogramming.wikispaces.com/Match+Statistics
Elo, Elo difference, and Likelihood of Superiority [LOS] are the most often-used comparison measurements.

The only way to avoid playing many games is looking at what other people have already measured
CCRL ratings
http://www.computerchess.org.uk/ccrl/404/

Here's a little chart for you (I've forgotten where I got it from and I don't know how accurate it is):
Code: Select all
         Confidence
 Score   90%  95%  99%
  55%    170  281  550
  60%     46   71  141
  65%     21   30   64
  70%     14   18   35
  75%      9   13   22
  80%      7   11   17
  85%      7    8   14
  90%      4    5   11
  95%      4    5    7
 100%      4    5    7


'Confidence' is a statistical term. A 90% confidence means that your results will be right 90% of the time and wrong 10% of the time. Notice that, if one engine is much stronger than the other, then very few games are needed.

The table indicates that if one engine scores 55% success then, for 90% confidence you will need to play 170 games. If you want 99% confidence you will need to play 550 games.

I'm not sure what class you are taking or what kind of statistics background you have. Mathematically the shortest possible test for you is to choose the strongest chess engine and compare it to the weakest engine. As soon as your measured Elo difference between the engines is beyond the 99% confidence error bars you're done.



Thanks for your information, that's what i need, my teacher want me to make a report of my chess engine strength and speed comparing with other. Well i am newbie and as expected my engine always losing. Thats why i confidence enough with a few test can describe my engine. After this report i want to upgrade it to get my revenge lol.. I have a lot of question about evaluation function that will be my next post after i finish my report..ok thanks a lot.
lius28
 
Posts: 3
Joined: 02 May 2013, 02:14

Re: Engine Test

Postby H.G.Muller » 08 May 2013, 07:35

If your program is too weak for the chosen opponent, you should test it against a weaker opponent of known rating. The best way to compare program strength is not to play them against each other, but play them both against a variety of opponents, and see which one does better. You could pick 10 engines, and let the engines-under-test 10-100 games against each of those, using a GUI opening book to force variety of the games.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Engine Test

Postby frodewin » 10 Sep 2013, 15:13

The new version of Xboard / Winboard supports automated tournaments between engines.
For a quick assessment of a program's strength, you can pair it in a tournament with other engines with known rating.

A list of chess engine ratings is available at http://www.computerchess.org.uk/ccrl/404/rating_list_all.html
Most of the engines listed there provide a Winboard interface. The engines listed in orange and green are available freely.
frodewin
 
Posts: 1
Joined: 10 Sep 2013, 09:28


Return to Anything else

Who is online

Users browsing this forum: No registered users and 5 guests