Hello Pradu,
These are my answers, but please take them with a grain of salt. Bob is the real expert, of course.
Pradu wrote:How would you handle move ordering differently for the parallel search?
In my case, exactly like for my old non-parallel search.
For example, would you share killers for the next ply at the split point?
No, I don't. I'm not sure how much it would help.
What would be the best way to handle hash tables (transposition table, pawn hash, ect.)?
For the transposition table, I do nothing at all. All threads share the same transposition table, without locking. I always check hash moves for legality, though. For pawn and material hash tables, I have a separate table for each thread.
How would you make history heuristic work best? Would you want to spend time copying all move ordering data (like history, hash) when splitting?
I once tried giving each thread its own history table, but it didn't work well. I now let all threads share the same table (again, without locking), hence no copying is necessary.
For Buzz, I only split at alpha+1==beta
I think this is a bad idea. I did the same thing in the first (non-public) parallel version of Glaurung 2, and it turned out that the efficiency of the parallel search improved enormously when I allowed to split at nodes where alpha + 1 < beta.
so I don't have to check for bounds updating; however, I still have to check for cuttoffs. For this I loop through the entire search stack at every new node and poll the split points for a cutoff flag. Is there a more efficient/elegant way to do this?
I do something similar to what you do. Each split point contains a pointer to its "parent split point", i.e. the closest split point along the path back to the root of the search tree (this pointer is NULL if there is no split point along the path to the root). When testing whether a thread should stop searching, I look at its parent split point, its grandparent split point, and so on all the way to the root. If a beta cutoff has occured at any of these split points, the thread stops its current search, and returns to its idle loop.
When creating a split point, I use malloc; when destroying it, I use free. Does it matter whether I use dynamic or static memory allocation for performance?
Yes, probably. I think it is much better to preallocate a pool of split point objects during program initialization.
Tord