Winboard Forum

Website · by **Ross Boyd** » 18 Apr 2006, 23:29

First of all, I would like to inform all readers that I won't be able to read any posts in this or other threads during the next three weeks. I'm leaving for a 3-week holiday in less than half an hour.

Hope you're enjoying your break....

Yes. I have never been very patient, and never test changes very carefully. I am also frustrated that my test results usually turn out to be confusing, contradictory and non-reproducible.

The last part about contradictory and non-reproducible resulsts is what I'm seeing. That's why I began to suspect Arena. But maybe its just all statistical noise. It sure makes it hard to have confidence in the test results.

Cheers,

Ross

Website · by **Ross Boyd** » 29 Apr 2006, 02:59

Hi Tom,
Sorry I didn't see your reply earlier.

Are you running Windows or Linux?

Windows.

Also how do you ensure that the conditions are the same from test to test?
What I mean, is how do you ensure that the experimental versions use equal CPUs, memory etc., so that you're making an apples to apples comparision?

Well, for financial reasons, I don't use identical machines as far as hardware goes... just basically anything I can get my hands on at a cheap price with a fairly decent CPU. (I work in a PC shop which helps with finding good deals).

I just assume that two engines playing on the same PC will get a proportionally equal and fair amount of CPU time. I wish all the hardware was identical but unfortunately this just can't be avoided.

I never play games between 2 PCs (but I'm pretty sure that's not what you meant).

Hash sizes are always the same for a given tournament. Games are always played using same time control using Nunn etc starting positions. I want to avoid 1+1 time controls for testing... 5 seconds per move think time is, imho, an absolute minimum for consistency. Ideally, if I had 500 PCs I would get each one to play a few games at 1 minute per move. That would eliminate the problem of unequal CPU sharing.... and you would have a statistically meaningful result within an hour or two. Now that would be heaven.

I wonder what you and others are doing as far as automated test environments go. Anything that saves time AND produces meaningful data AND doesn't require a battalion of PCs is very interesting to me.

Cheers and keep up the good work with Francesca!

Ross

by **mjlef** » 01 May 2006, 08:35

Ross Boyd wrote:
Hash sizes are always the same for a given tournament. Games are always played using same time control using Nunn etc starting positions. I want to avoid 1+1 time controls for testing... 5 seconds per move think time is, imho, an absolute minimum for consistency. Ideally, if I had 500 PCs I would get each one to play a few games at 1 minute per move. That would eliminate the problem of unequal CPU sharing.... and you would have a statistically meaningful result within an hour or two. Now that would be heaven.

I wonder what you and others are doing as far as automated test environments go. Anything that saves time AND produces meaningful data AND doesn't require a battalion of PCs is very interesting to me.

Cheers and keep up the good work with Francesca!

Ross

I also use 5 secs per move and run at least 100 games before I decide if a program change is an improvement or not. Every several "improvements" I run something like a 15 sec per move match, just to try and verify if the changes were good or not (also 100 games, from identical positions, both sides playing white and black once each). Then I still wonder if it is better or not.

I also use very dull opening, midgame and endgame positions to see if seach extensions, reductions and other changes have made any difference in total nodes at given depths. Does the program still find reasonable or the same moves at the same depth when I say add a extension or reduction.

Mark

Winboard Forum

Simple NULL move enhancement, checks in qsearch

Re: Simple NULL move enhancement, checks in qsearch

Re: Simple NULL move enhancement, checks in qsearch

Re: Simple NULL move enhancement, checks in qsearch

Who is online