Tuning parallel algorithms; Some questions
Posted: 28 Sep 2007, 10:32
Hi,
currently I am working at tuning the parallel algorithm of Spike using a quad-core (opteron) computer - no I haven´t got one of the new 4 core opteron chips.
I am currently not able to archive a speedup > 3. As far as I remember Bob Hyatt wrote somewhere that he gained a speedup of about 3.8.
I do think that a strong move reduktion algorithm will cut down the speedup as only few moves have large node counts.
Measurement of speedup:
For speedup measurement I use the 100 test positions and calculate all of them to a fixed depth. I do that with the single cpu version and compare the runtime to the 4 cpu version. I currently use depth 14 that´ll give me about 10 seconds for a position.
I wonder which speedup you get and what are your "triggers" for speedup.
In Spike I have the following results:
Average node factor: ca. 1.19
Average nps factor: ca. 3.5
Average speedup: ca. 2.9
The results varies between 2.7 and 3.1 thus 2.9 is an average of 30 searches of all 100 test positions.
The nps factor gets better with longer searches pushing the speedup to a little more than 3.
I do loose time waiting for a suitable split point. There are possibilities to grow the nps factor by using bad split points but this has a tradeoff in the node factor.
positive "tradeoffs" (i.e. nps increase is worth more than the tradeoff in node-factor):
*Splitting at cut nodes if the node has probed at least one move.
*Splitting at nodes with winning captures left.
*Splitting near horizont
negative "tradeoffs"
*Splitting at nodes where no move has been played (even at all nodes).
Debugging is hard. Lately I found a bug in terminating helper threads on beta-cutoffs in the split points. Thus maybe there are bugs left.
About Spike MP Algorithm:
Spike uses one thread for every cpu and one thread for the input handling. Threads without work are scanning for work offer of the threads with work. If there are multiple work offers they try to pick the best one.
A thread that terminated work is able to help his helper. Alpha >= Beta will terminate every thread in this brance, helper, helper-helper, ...
I currently don´t terminate calulations in pv nodes on alpha increase and start them again with smaller windows. My last implementation didn´t gain a speedup.
What are your experiences with mp speedups?
Greetings Volker
currently I am working at tuning the parallel algorithm of Spike using a quad-core (opteron) computer - no I haven´t got one of the new 4 core opteron chips.
I am currently not able to archive a speedup > 3. As far as I remember Bob Hyatt wrote somewhere that he gained a speedup of about 3.8.
I do think that a strong move reduktion algorithm will cut down the speedup as only few moves have large node counts.
Measurement of speedup:
For speedup measurement I use the 100 test positions and calculate all of them to a fixed depth. I do that with the single cpu version and compare the runtime to the 4 cpu version. I currently use depth 14 that´ll give me about 10 seconds for a position.
I wonder which speedup you get and what are your "triggers" for speedup.
In Spike I have the following results:
Average node factor: ca. 1.19
Average nps factor: ca. 3.5
Average speedup: ca. 2.9
The results varies between 2.7 and 3.1 thus 2.9 is an average of 30 searches of all 100 test positions.
The nps factor gets better with longer searches pushing the speedup to a little more than 3.
I do loose time waiting for a suitable split point. There are possibilities to grow the nps factor by using bad split points but this has a tradeoff in the node factor.
positive "tradeoffs" (i.e. nps increase is worth more than the tradeoff in node-factor):
*Splitting at cut nodes if the node has probed at least one move.
*Splitting at nodes with winning captures left.
*Splitting near horizont
negative "tradeoffs"
*Splitting at nodes where no move has been played (even at all nodes).
Debugging is hard. Lately I found a bug in terminating helper threads on beta-cutoffs in the split points. Thus maybe there are bugs left.
About Spike MP Algorithm:
Spike uses one thread for every cpu and one thread for the input handling. Threads without work are scanning for work offer of the threads with work. If there are multiple work offers they try to pick the best one.
A thread that terminated work is able to help his helper. Alpha >= Beta will terminate every thread in this brance, helper, helper-helper, ...
I currently don´t terminate calulations in pv nodes on alpha increase and start them again with smaller windows. My last implementation didn´t gain a speedup.
What are your experiences with mp speedups?
Greetings Volker