I was just trying to make some changes to crafty (time-function) and before
doing that I wanted to test its strength (so to be able to compare later).
I didn't want to use the autoplayer (people tend to get strange results) or the
cb-adaptor (where I get _very_ strange results) so I did let it play against the
(IMHO) second best freeware engine "Comet" under WBoard.
I did make sure they got the same HT (checked with memory tool).
I did make sure they got the same processor time (checked with some system
tool), of course no other processes running and restart before every match and I
have a very stable system.
I did delete the learning files after every match.
I did let them play from the Nunn positions, one time with white, one time with
black, cause I wanted to test the engine, not the opening book.
I did make sure both got 4man TB acess. (btw: i'd still like to know what you
think are the most usefull 5man TB)
k6II/400,15min/game. Result : newest Crafty 16.9 : 15,5 newest CometB06 : 4,5
(!).
Hey, I thought, this can't be: Comet isn't _that_ weak! So I did another test,
with _exactly_ the same configuration, only 5 0 games. Result 9,5, : 10,5. Comet
won (!).
Hey, I thought, this can't be: there is too much a difference between those
matches. So I did another test with the same configuration, only 14 0 (!) games.
Result: 14,0 : 6,0.
This is 1,5 more points for Comet only because 14 0 instead of 15 0 !
What does this teach us ?
Forget about any serious testing (hello SSDF ! ) if you don't play at
least 200 matches between every engine. If not you just get garbage results.
Best regards,
Tec--