What result is significant?

Discussions about Winboard/Xboard. News about engines or programs to use with these GUIs (e.g. tournament managers or adapters) belong in this sub forum.

Moderator: Andres Valverde

What result is significant?

Postby Uri Blass » 18 Feb 2006, 08:28

I know that testing of x+1 against x may be misleading and I read based on experience of testers that the best chessmaster personality againstchessmaster personalities is not best against other programs.

My question is the following.

Suppose that you have 2 personality of chessmaster and you have result of 100 game noomen match.

What is the minimal result that you can be practically sure that the winner is better against other programs(It seems to me that if you get 90-10 you can be sure that the winner is better also against other programs).

Another question:

What is the maximal result that you got between a and b in a match of 100 games when a won the match but was not better than b against other programs?(you can use of course example of programs that are not clones like fritz and Toga1.0).

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: What result is significant?

Postby Steve Maughan » 18 Feb 2006, 14:51

Have a look at a little utility that I wrote many years ago:

http://www.stevemaughan.com/whoisbetter.htm

It uses the binomial distribution to test significance. Other have expanded upon this e.g. Remi Coulom

Steve
Steve Maughan
 
Posts: 48
Joined: 06 Oct 2004, 17:40
Location: Florida USA

Re: What result is significant?

Postby Uri Blass » 18 Feb 2006, 15:39

Steve Maughan wrote:Have a look at a little utility that I wrote many years ago:

http://www.stevemaughan.com/whoisbetter.htm

It uses the binomial distribution to test significance. Other have expanded upon this e.g. Remi Coulom

Steve


The problem is that it may be possible that a program is stronger against previous version but not stronger aganist other programs.

You need not only to be sure that statistical noise did not effect the result but also to be sure that the program is not better only against itself so you need higher result.

My question is basically what is the best result that a got against b in 100 game match and still was weaker than b based on results against other programs.

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: What result is significant?

Postby Casper W. Berg » 21 Feb 2006, 13:43

Unfortunately you can't test this.

If program A is really better than B, but not better at playing chess in general (i.e. not better against other engines), no amount of games between A and B will reveal this fact.

But the chance of finding a local optimum against variants of the same program (like Chessmaster) is probably larger than between to completely different engines because the searchs/evals will differ more in the last case.

To answer you questions you need to know the chance of hitting a local optimum + the statistical distribution of how much better these local optima are, which I dare say is impossible to find in general.

To get reliable results you need to test against different opponents...

Casper
User avatar
Casper W. Berg
 
Posts: 27
Joined: 11 Jan 2006, 22:33


Return to Winboard and related Topics

Who is online

Users browsing this forum: No registered users and 43 guests