Hi Daniel , David,
Under normal circumstances where you have different search procs picking up moves in parallel and searching , the possibility that you will get same node count between release and debug , and even between diff runs of same version is going to be quiet tough.
That said , the algorithm that you mention (Daniel) should not suffer from this - and you _should_ get same node count (Assuming I am not missing something critical here !). This can be a very vital debugging tool ! At a glance you see if there is something amiss.
I like Antony mentioned , I find debugging a multiprocessor version infinitely more easier than a multithreaded version.
Other than other benifits , specifically in our case , the interactions are limited to the shared mem variables.
So you know exactly were to concentrate looking for bugs in case there are any.
You dont need to try to break your head about all the other possible globals that are interacting (and much as I like - it is impossible to efficiently eliminate all globals).
That being said (maybe too late for you guys to change impl , or it is a design decision with other considerations I am not aware of , personal preferance , etc !) , let us try to analyze possible solutions to our problems.
@Daniel
for each of the moves
{
if(first move ||
no idle processor)
search normally()
else
search_with_thread()
}
Wait for helper threads to finish();
1) Just wondering , does search_with_thread() return only after search has already started in the new thread ?
Is this an active push to idle threads or do idle threads pull possible split points from some queue/list ?
2) How do you handle a fail high from the child thread ?
3) Hope you use lockless hashing/some locked hashing strategy !
4) Your move generation stack is consistent across processors right ?
(You need to have a per thread move generation stack or something like that)
5) Hope you are validating all cached/hashed moves before making them.
6) Since you have ruled possibility of race condition (there are n number of possible race conditions even in your algo , but we will come to that later) , then what Antony suggested is actually ideal for you.
Dump everything into a file , make a GUI based viewer of logfile and drill down - it is as simple as that.
The GUI might take some time writing - but trust me , you will always be happy with the 1 week you invested in it for the hundreds of bugfixing weeks it saved you ! (and please using some tree like UI , so that you can selectively go into or step out of subtree's , very intutive and easy to use and understand when you analyse logfile along with source code).
7) It helps in debugging if you support multiple platforms (assuming you dont use platform specific api , assembly , etc). Even when things run smooth in one platform (read platform here as triplet of HW/OS/compiler combination) , you will see probs in other or errors/warnings on other will make more sense , etc. So you might want to look into that direction also ...
8) -D_REENTRENT was for *nix : I think there is similar flags for windows (multithreaded dll or something in vc) : dont remember.
Hope this helps !
I got some really quality feedback and help when I wrote my MP version and I was ever grateful for it and it saved tonnes of my time and patience (which was running pretty towards middle) - I know how frustrating it can get , but dont ever give up !
The feeling is just too amazing when you see your MP version do 3.8x effective speedup on a quad
@David
Are you using a parallel engine ? If yes , then the counts need not be same on general principles ...
If no , then you have a bug(s)
Like I mentioned to Daniel , a search dump and analysis would be simple way to start : you have got some suggestion I see at CC which were general heurestics I can suggest to track this down
More info on what you observe would be great too !
Hope this helps,
Regards
Mridul