Hi Tom,
Sorry I didn't see your reply earlier.
Are you running Windows or Linux?
Windows.
Also how do you ensure that the conditions are the same from test to test?
What I mean, is how do you ensure that the experimental versions use equal CPUs, memory etc., so that you're making an apples to apples comparision?
Well, for financial reasons, I don't use identical machines as far as hardware goes... just basically anything I can get my hands on at a cheap price with a fairly decent CPU. (I work in a PC shop which helps with finding good deals).
I just assume that two engines playing on the same PC will get a proportionally equal and fair amount of CPU time. I wish all the hardware was identical but unfortunately this just can't be avoided.
I never play games between 2 PCs (but I'm pretty sure that's not what you meant).
Hash sizes are always the same for a given tournament. Games are always played using same time control using Nunn etc starting positions. I want to avoid 1+1 time controls for testing... 5 seconds per move think time is, imho, an absolute minimum for consistency. Ideally, if I had 500 PCs I would get each one to play a few games at 1 minute per move. That would eliminate the problem of unequal CPU sharing.... and you would have a statistically meaningful result within an hour or two. Now that would be heaven.
I wonder what you and others are doing as far as automated test environments go. Anything that saves time AND produces meaningful data AND doesn't require a battalion of PCs is very interesting to me.
Cheers and keep up the good work with Francesca!
Ross