Winboard Forum

by **Giuseppe Cannella** » 13 Nov 2006, 22:30

i need to test a new release of my engine, i think to run the WAC suite whith old and new version and check if new version
finds more solutions than the older, is this approach valid?

what about your test methodology?

by **Casper W. Berg** » 14 Nov 2006, 01:44

I think this is the wrong forum for this topic, but anyway:
You get what you test for. So for a test suite you can only infer about the ability to solve tactical problems (fast).
This need not be the same as a higher rating.

I like to test playing strength by playing tournaments using nunn and noomen startpositions -- they were designed for this purpose.
Preferably against many opponents, because two versions of the same program is likely to give a greater difference in score than against other programs.

best wishes with your new release,

-Casper

by **Pedro Castro** » 14 Nov 2006, 13:54

I agree with Casper. The testsuites generally proves the tactical capacity of the engines. You can make it but don't make single WAC, maybe the engine makes to improve in some test and to play worse others. You also have 1001 Brilliant Ways to Chekmate, 1001 Winning Chess Sacrifices & Conbinations, Encyclopedia of Chess Middlegames, endgames (MES400), ICQ6, etc.

But at the end you have to play games. In this case like Casper says to face it to several engines. The idea of playing with type positions Nunm instead of with openning books is interesting, but in my case it doesn't work better maybe is because the test is generally in blitz and the computer doesn't seem that it guarantees the stability so that they repeat the results with fixed positions, I understand that it is problem of programs residents like antivirus and other, the most minimum difference gives another result in the game.

by **Volker Böhm** » 01 Dec 2006, 13:17

This forum s a very good place to discuss about testing. In my opigion the following is a good idea to test a new version.

1. Test test positions only on early versions to find bugs.

2. Play a standard testsuite. Allways use the same openings (noomen, nun, or anything like), disable all books.
Reason: books tends to play the same moves thus you may get same games (at least in opening) using books. And using books you start to optimize your engine on the book. Better to optimize the book for your engine later.

3. Play LOTS of games. Currently we play 1200 games for any new version.

4. Better play very short games than less games. But allways add an increment to any move to prevent "random moves" because of small time left. An example 2min+1Sec./move.

Greetings Volker

by **Volker Pittlik** » 01 Dec 2006, 13:30

Volker Böhm wrote:This forum s a very good place to discuss about testing. ...

Yes.

Volker

by **Uri Blass** » 02 Dec 2006, 18:53

Volker Böhm wrote:This forum s a very good place to discuss about testing. In my opigion the following is a good idea to test a new version.

1. Test test positions only on early versions to find bugs.

2. Play a standard testsuite. Allways use the same openings (noomen, nun, or anything like), disable all books.
Reason: books tends to play the same moves thus you may get same games (at least in opening) using books. And using books you start to optimize your engine on the book. Better to optimize the book for your engine later.

3. Play LOTS of games. Currently we play 1200 games for any new version.

4. Better play very short games than less games. But allways add an increment to any move to prevent "random moves" because of small time left. An example 2min+1Sec./move.

Greetings Volker

For 4 I think that it may be a good idea to test also at fixed number of nodes per move in order to get deterministic results.

if you test at fixed number of nodes you can make other tasks in the computer at the same time of testing and you do not need to care about problems like one program that does not get enough cpu time.

Even if you do some change that changes the number of nodes per second you can still change your program and tell it to stop only after it searches 1.2n nodes and not after n nodes so this problem can be eliminated.

If there are testers who are interested in testing movei at fixed number of nodes against uci engines that support the nodes command like Sos or rybka then I will be interested in testing

Note that both Sos and Rybka do not support the fixed number of nodes perfectly and they can report more nodes then the number of nodes that they are supposed to search but if they produce deterministic games(something that I did not check) I do not care about it.

Note that I added to latest movei a possibility to multiply the number of nodes by an integer so it may be possible to get interesting results even against rybka(with factor that is big enough movei can beat rybka and movei at 1,000,000 nodes per move is leading against rybka at 1,000 nodes per move in the noomen match with the result 6-2 inspite of the fact that rybka often reports that it searches 1500 nodes or similiar number).

I can add that rybka was probably lucky in the first games
because Movei is leading 18-4 after 22 games.

Uri

Winboard Forum

test methodology

test methodology

Re: test methodology

Re: test methodology

Re: test methodology

Re: test methodology

Re: test methodology

Who is online