by Robert Pope » 11 Jul 2013, 21:22
Right, but in order to test the tree routines, my engine has to have the exact same specification for my evaluation function as the generating program does. The page that you link to has factors like center control, isolated pawns, mobility (legal vs. pseudo-legal), that you have to program in exactly in order to be able to test against the posted solutions. If your results don't match, is your search routine wrong, or did you make a mistake in programming the evaluation? Now you have two things to debug.
Also, for anything beyond the trivial minimax, the only thing you can check against is the returned score, because the order in which the moves are searched affects the tree shape. If you use a simple evaluation, then you will often return the exact same score even if the search algorithm is flawed, because so many positions return the same score (e.g. material is even). If you use a more complicated evaluation, you're back to the first problem. Plus I'm spending time writing and testing an evaluation function that I don't intend to use for real.
And then, once you get past the basic minimax/negamax/alphabeta algorithms, you lose the ability to do even these types of comparisons, since so many things interact with each other. So the scope is very narrow - you're only testing about 20 lines of code that you can practically copy and paste from Bruce Moreland's web pages.
Good testbeds of perft are critical to getting your program on your feet. Sample positions of "you need a working hashtable to find the mate" are helpful. I would be all for better pseudocode on implementing fail-hard vs fail-soft on the Wiki.
But a testbed for minimax or alphabeta feels like going to the store to buy a sledgehammer when you want to put a pushpin on the bulletin board. Really you can just push it in with your thumb.
Edited to add: Sorry if I came off sounding harsh here. Even if I don't think it's workable, it was a good question to raise.
Last edited by
Robert Pope on 11 Jul 2013, 21:37, edited 1 time in total.