Winboard Forum

by **Fermin Serrano** » 25 May 2008, 19:34

Altought I have read about it I don't know exactly how to take adventage of the results of testing MLmfl over my engine. It is suppoused to be and epd suite to let you adjust the differents positional values of your engine. How do you do that? you run the test and, what do you do with the results? How exactly this kind of test must be run?

by **Teemu Pudas** » 25 May 2008, 21:14

It is suppoused to be and epd suite to let you adjust the differents positional values of your engine.

Not quite. It's supposed to be a collection of opening positions that can be used for testing engines against each other. The results can be used for tuning pretty much anything, not just the eval.

How exactly this kind of test must be run?

Play a bunch of games, preferably the whole suite against several engines with both colours. Feed the games to BayesElo. Rinse and repeat with different settings.

There's a more complete explanation here.

by **Tord Romstad** » 07 Jun 2008, 08:28

Teemu Pudas wrote:
It is suppoused to be and epd suite to let you adjust the differents positional values of your engine.

Not quite. It's supposed to be a collection of opening positions that can be used for testing engines against each other. The results can be used for tuning pretty much anything, not just the eval.

How exactly this kind of test must be run?

Play a bunch of games, preferably the whole suite against several engines with both colours. Feed the games to BayesElo. Rinse and repeat with different settings.

There's a more complete explanation here.

I wasn't aware of this test suite before, but I like it, and have now started using it. Thanks, Marc!

One question about the methodology: Marc's approach is to first build a "Reference Base" by playing a huge round-robin tournament of 64-game matches between 9 engines. When a new engine is tested, it plays 64-game matches against 8 of these 9 engines, and the new games are added to a copy of the reference base before the rating is computed.

I don't understand the purpose of copying the reference base. Wouldn't it be more accurate to just add all new test matches to the referance base, and let it continue to grow as new engines are tested?

Tord

by **Tony Thomas** » 07 Jun 2008, 08:49

Tord he is probably doing that as adding more matches would skew the ratings of the reference engine. For example, if you are using the reference base to test Glaurung and one of the opponents (Rybka for example) performs really bad against glaurung. After about 10 beta versions the relative rating of Rybka would be lower which would affect the rating of Glaurung as well (as the rating of the average opponents is lower). I doubt that the changes would be significant, I have seen Ray from CCRL doing something similar in his testing of Chessmaster 11000 personalities.

by **Marc Lacrosse** » 07 Jun 2008, 09:01

Tord Romstad wrote:I wasn't aware of this test suite before, but I like it, and have now started using it. Thanks, Marc!

You are welcome !
Thanks for your appreciation.

Tord Romstad wrote:One question about the methodology: Marc's approach is to first build a "Reference Base" by playing a huge round-robin tournament of 64-game matches between 9 engines. When a new engine is tested, it plays 64-game matches against 8 of these 9 engines, and the new games are added to a copy of the reference base before the rating is computed.

I don't understand the purpose of copying the reference base. Wouldn't it be more accurate to just add all new test matches to the referance base, and let it continue to grow as new engines are tested?

Tord

I somewhat feared that something could get skewed if you have tested a large number of only slightly different versions of the same engine and then test a completely different engine : in the resulting global database there will be some kind of overweight for the more largely tested one. And this affects the global evaluation of your reference team.

I observed this after having tested dozens (hundreds?) of slightly differently tuned fruit subversions.

Marc

by **Tord Romstad** » 07 Jun 2008, 11:46

Marc and Tony,

Thanks for the explanation! This makes sense. Next question: What's the idea behind having nine engines in the reference base, but only testing against eight of them?

Tord

by **Tony Thomas** » 07 Jun 2008, 12:07

Tord Romstad wrote:Marc and Tony,

Thanks for the explanation! This makes sense. Next question: What's the idea behind having nine engines in the reference base, but only testing against eight of them?

Tord

I would say it is probably to get 512 games for each of the 9(8) reference engines. If he were to use only 8 engines in the reference base, then he would only have 448 games per ref engine..I guess he just randomly chose a 9th engine to get his prefered 512 games..

OOps, I noticed that Marc himself answered your question about copying the reference database even before I did. May be I should start reading the answers lot of bit slower. :mrgreen:

by **Marc Lacrosse** » 07 Jun 2008, 12:14

Tord Romstad wrote:What's the idea behind having nine engines in the reference base, but only testing against eight of them?
Tord

If I remember correctly, this has no other explanation than an historical one.

I first decided to have 8 engines as a reference and had this idea of mixing the RR set of games between them with the gauntlet games of a tested engine for establishing the relative rating of the tested one.

I began to test engines and established my rating list.

Then I wished to compare my list with other ones and chose shredder as a good reference point with many games in almost all rating lists.

I then had shredder play its own MLmfl test and this served as my reference point (shredder = 2750) for normalising all results.

As I had already tested many things against the eight engines I continued with the same procedure for all newcomers (against eight and not nine ones).

Marc

by **Onno Garms** » 07 Jun 2008, 19:28

Hi Marc,

also many thanks for that suite. I wasn't aware of it either.

Do you have a program to create that suite from a database or did the creation require manual interaction?

I would really be interested in an a little larger suite of some hundred lines. Can that be created by just decreasing the thresholds from 1200/1000 to some smaller value?

Greetings,
Onno

by **Marc Lacrosse** » 07 Jun 2008, 22:43

Onno Garms wrote:Do you have a program to create that suite from a database or did the creation require manual interaction?

Unfortunately this was done manually.

Marc

Winboard Forum

MLmfl and other test suites

MLmfl and other test suites

Re: MLmfl and other test suites

Re: MLmfl and other test suites

Re: MLmfl and other test suites

Re: MLmfl and other test suites

Re: MLmfl and other test suites

Re: MLmfl and other test suites

Re: MLmfl and other test suites

Re: MLmfl and other test suites

Re: MLmfl and other test suites

Who is online