Winboard Forum

by **Uri Blass** » 16 Jun 2007, 23:12

I have problems to post in the chess computer club because of changing the server but I read claims by tord that it can be easy to implement parallel search in toga simply by copy and paste parts of glaurung and I simply do not understand how people think that it is possible to do it so fast.

Toga has global variables and I understood from Dann corbit that in order to use parallel search you need to get rid of parallel variables.

I do not know a way to do something like that in few hours and in case that it is possible to do it then I would like to get step by step instructions how to do it.

Here are the posts from CCC:

from the chess computer club
http://216.25.93.108/forum/viewtopic.ph ... 6&start=10

"mjlef wrote:
Doesn't the Fruit/Tgoa license state anything derived from that code must be released in source code format? Has this been done? A parallel Toga would be very interesting to play with.

I am not sure if it has been done, but it would be very easy to do. You could just cut and paste Glaurung's parallel search code into Fruit/Toga and make a few minor changes to make it work with the different data structures. I guesstimate that it should take about 3-10 hours of work, depending on how familiar you are with the inner workings of Fruit.

Tord"

later:

Zach Wegner:

I implemented SMP in Toga a while back, but for some reason I was never able to get a real speedup. For some reason the stop/split ratio was always around 25% and seemed to grow when search time increased. This is different than my own engine which rarely gets more than a few % using the same split logic. If any developers are interested in trying to fix it, I'll be happy to send them the code.

by **Zach Wegner** » 17 Jun 2007, 00:48

The code I wrote was from scratch, I didn't copy anything. It didn't take very long (a few days, I can't remember) but keep in mind I have implemented parallel search more than once before.

by **bob** » 17 Jun 2007, 06:54

Uri Blass wrote:I have problems to post in the chess computer club because of changing the server but I read claims by tord that it can be easy to implement parallel search in toga simply by copy and paste parts of glaurung and I simply do not understand how people think that it is possible to do it so fast.

Toga has global variables and I understood from Dann corbit that in order to use parallel search you need to get rid of parallel variables.

I do not know a way to do something like that in few hours and in case that it is possible to do it then I would like to get step by step instructions how to do it.

Here are the posts from CCC:

from the chess computer club
http://216.25.93.108/forum/viewtopic.ph ... 6&start=10

"mjlef wrote:
Doesn't the Fruit/Tgoa license state anything derived from that code must be released in source code format? Has this been done? A parallel Toga would be very interesting to play with.

I am not sure if it has been done, but it would be very easy to do. You could just cut and paste Glaurung's parallel search code into Fruit/Toga and make a few minor changes to make it work with the different data structures. I guesstimate that it should take about 3-10 hours of work, depending on how familiar you are with the inner workings of Fruit.

Tord"

later:

Zach Wegner:

I implemented SMP in Toga a while back, but for some reason I was never able to get a real speedup. For some reason the stop/split ratio was always around 25% and seemed to grow when search time increased. This is different than my own engine which rarely gets more than a few % using the same split logic. If any developers are interested in trying to fix it, I'll be happy to send them the code.

Globals are OK. So long as they are "read-only" inside the parallel part of the search. But there's no real way to do a decent parallel search with everything global, as you need local data for each thread, and turning every scalar global variable that needs to be modified into an array indexed by the thread id is a real pain...

No way it is "easy" to add a parallel search to any engine. But cutting/pasting parts of glaurung would certainly be far easier than writing one from scratch.

by **Tord Romstad** » 17 Jun 2007, 11:12

Uri Blass wrote:I have problems to post in the chess computer club because of changing the server but I read claims by tord that it can be easy to implement parallel search in toga simply by copy and paste parts of glaurung and I simply do not understand how people think that it is possible to do it so fast.

Toga has global variables and I understood from Dann corbit that in order to use parallel search you need to get rid of parallel variables.

You don't necessarily have to remove all global variables. The problem is only variables which change during the search. The most obvious example is the variable(s) holding the board data structure, a single global board clearly won't work if there are several CPUs searching simultaneously.

There are also problems with things like pawn hash tables: If two threads accidentally try to simultaneously update the same pawn hash table entry, this will almost certainly cause the contents of this pawn hash table entry to be corrupted. This can cause non-reproducible bugs which can be tricky to find. There are two ways to solve such problems: To have a separate pawn hash table for each thread, or to use locks to prevent two threads from writing to the pawn hash table simultaneously. The former solution consumes more space, but the latter will probably be slower.

"Pseudo-constants", i.e. global variables which never change during a search, can remain global.

If I recall correctly, Fruit doesn't have a lot of global variables which are not pseudo-constants, so adding Glaurung's parallel search should be easy. In some other programs (Phalanx, for instance), it would take more work.

I do not know a way to do something like that in few hours and in case that it is possible to do it then I would like to get step by step instructions how to do it.

OK, I'll try:

Find all the global variables in the code, and see if there are some that are modified during the search, in such a way that there could be problems if several threads tries to modify the contents at the same time. For each such variable, either make the variable an array indexed by the number of threads, or protect it with a lock.
Add a new integer parameter named threadID to the search and evaluation function, and to all other functions which for some reason needs to know which thread it is called from. The evaluation function needs to know this because it looks up the current position in the pawn and material hash tables, and you will probably want to have separate pawn and material hash tables for each thread. The search function needs to know which thread it is called from because it calls the evaluation function, and therefore must be able to tell the evaluation function its threadID.
Copy the Thread and SplitPoint types from Glaurung (thread_t and split_point_t in Glaurung 2). In the Thread type, remove the variable printCurrentLine, because Fruit doesn't support the UCI_ShowCurrLine feature. In SplitPoint, modify the contents to match the parameters and local variables in Fruit's search function.
Copy the rest of Glaurung's parallel search code. Modify the split() function to work with your modified SplitPoint structs, and sp_search/sp_search_pv to look like Fruit's regular search function. Because Fruit, unlike Glaurung 2, uses the same search function for PV and non-PV nodes, you can probably combine split and split_pv to one function, and similarly with sp_search and sp_search_pv.

bob wrote:No way it is "easy" to add a parallel search to any engine.

I'd say it is - at least as long as you don't care much about how well the search scales on more than 4 CPUs (I don't care about this before machines with 8 CPUs or more become common and affordable).

Writing a parallel search used to be very hard, but thanks to the efforts of pioneers like you, the problem has become so well understood that it is now rather straightforward to implement. It is just like calculus, which was understood only by the mathematical elite back in the days of Newton and Leibniz, but which is successfully taught to millions of young students with average mathematical skills today.

bob wrote:But cutting/pasting parts of glaurung would certainly be far easier than writing one from scratch.

Easier, but also less fun.

I should also warn that Glaurung's parallel search can only be used in programs released under the GNU General Public License. For Toga, this is obviously not a problem.

Tord

by **Uri Blass** » 17 Jun 2007, 14:13

Thanks for your explanation

I know nothing about parallel programming and there are other things that I plan to try first before implementing parallel search but it seems based on your words that parallel search is easier than what Bob suggests.

It seems based on your words that people even do not need to get rid of global variables in order to implement parallel search and they simply may change them to arrays.

I wonder if there is a problem of speed in case that I use array of big tables(in movei one of my tables is 64*16 int)

Note that I have no experience in protecting a variable with a lock but I guess that if I implement parallel search then I need to do it for history table
and for the main hash because it does not make sense to use different history tables for different threads and different global hash tables for different threads when a position may be in the hash but the program will not find it because it is only in the hash of a different thread.

Uri

by **Tord Romstad** » 17 Jun 2007, 16:19

Uri Blass wrote:I know nothing about parallel programming and there are other things that I plan to try first before implementing parallel search but it seems based on your words that parallel search is easier than what Bob suggests.

You probably should learn a little about the basics of multithreaded programming before you try writing a parallel search, and it is also a good idea to start with some simpler exercises. Writing a parallel perft function could be a good way to start.

It seems based on your words that people even do not need to get rid of global variables in order to implement parallel search and they simply may change them to arrays.

Yes, that is one way to do it. I have done that with my pawn and material hash tables. Another possibility is to use processes rather than threads: In that case each process will automatically get its own copy of all global variables.

I wonder if there is a problem of speed in case that I use array of big tables(in movei one of my tables is 64*16 int)

Doesn't sound like it should be a big problem. The total size of the table is only 4 KB, and having several copies of it shouldn't slow you down much.

Note that I have no experience in protecting a variable with a lock but I guess that if I implement parallel search then I need to do it for history table and for the main hash because it does not make sense to use different history tables for different threads and different global hash tables for different threads when a position may be in the hash but the program will not find it because it is only in the hash of a different thread.

I use a single hash table and a single history table, but neither of them are protected by locks. For the hash table, the number of entries is so small that two threads writing to the same entry simultaneously almost never happens. As long as the hash move is always checked for legality, it is in my experience completely safe (in practise) to use lockless hashing. If you really worry about this, there is also a trick by Bob you can try, which can help you detect and discard corrupted hash entries. You'll find the details on Bob's web page.

For the history table, the consequences of two threads writing to the same entries are not very serious. The worst thing that could happen is that the history counters become very slightly inaccurate.

If you are worried about the history table, there is also a hybrid approach you could try: Let each thread have its own table, but add the counters from all the history tables when you read history counters. In other words, each thread is only allowed to modify its own history table, but can read from the history tables of all threads. I haven't tried this, but I see no reason it shouldn't work.

Tord

by **Dann Corbit** » 19 Jun 2007, 19:57

Look at the parallel search of Glaurung, Scorpio, and Viper (sort of a simplified Glaurung 1.x).

Also, read a few of the papers on parallel search.

Finally, I recommend this as well worth examiniation:
http://sourceforge.net/projects/libcmt/
http://sourceforge.net/projects/libltx/

In the long run, lockless memory transactions are clearly the way to go in the future. The benefit is modest for 4 processors or less, but on a cluster or hugely parallel system, the standard approaches are not competitive.

As far as worrying about global variables:
If you can decorate the name with const and your code compiles and runs, then you can cross that variable off of your list as far as things to worry about. It is only the variables that you write to that must be guarded.

by **Dann Corbit** » 19 Jun 2007, 20:04

Tord Romstad wrote:
Uri Blass wrote:I know nothing about parallel programming and there are other things that I plan to try first before implementing parallel search but it seems based on your words that parallel search is easier than what Bob suggests.

You probably should learn a little about the basics of multithreaded programming before you try writing a parallel search, and it is also a good idea to start with some simpler exercises. Writing a parallel perft function could be a good way to start.

It seems based on your words that people even do not need to get rid of global variables in order to implement parallel search and they simply may change them to arrays.

Yes, that is one way to do it. I have done that with my pawn and material hash tables. Another possibility is to use processes rather than threads: In that case each process will automatically get its own copy of all global variables.

I wonder if there is a problem of speed in case that I use array of big tables(in movei one of my tables is 64*16 int)

Doesn't sound like it should be a big problem. The total size of the table is only 4 KB, and having several copies of it shouldn't slow you down much.

Note that I have no experience in protecting a variable with a lock but I guess that if I implement parallel search then I need to do it for history table and for the main hash because it does not make sense to use different history tables for different threads and different global hash tables for different threads when a position may be in the hash but the program will not find it because it is only in the hash of a different thread.

I use a single hash table and a single history table, but neither of them are protected by locks. For the hash table, the number of entries is so small that two threads writing to the same entry simultaneously almost never happens. As long as the hash move is always checked for legality, it is in my experience completely safe (in practise) to use lockless hashing. If you really worry about this, there is also a trick by Bob you can try, which can help you detect and discard corrupted hash entries. You'll find the details on Bob's web page.

For the history table, the consequences of two threads writing to the same entries are not very serious. The worst thing that could happen is that the history counters become very slightly inaccurate.

If you are worried about the history table, there is also a hybrid approach you could try: Let each thread have its own table, but add the counters from all the history tables when you read history counters. In other words, each thread is only allowed to modify its own history table, but can read from the history tables of all threads. I haven't tried this, but I see no reason it shouldn't work.

Tord

Using processes rather than threads is a very good idea for a first attempt. I know of programs that did this successfully.

Here is a create process method. It's not really like fork(), in that the auto variables are not preserved and other important differences. But it is a simple way to start separate instances of your program.

Code: Select all: static int donothing; // an empty translation unit is a non-standard extension. #ifdef _MSC_VER #include <windows.h> #include <string.h> #include <stdlib.h> #include <stdio.h> int fork() { char szPath[FILENAME_MAX]; char szCommandLine[1024]; PROCESS_INFORMATION pi; STARTUPINFO startup_info; startup_info.cb = sizeof(STARTUPINFO); startup_info.lpReserved = NULL; startup_info.lpDesktop = NULL; startup_info.lpTitle = NULL; startup_info.dwX = 0; startup_info.dwY = 0; startup_info.dwXSize = 0; startup_info.dwYSize = 0; startup_info.cbReserved2 = 0; startup_info.lpReserved2 = NULL; // make sure the show state is used and has our settings startup_info.dwFlags = STARTF_USESHOWWINDOW; startup_info.wShowWindow = SW_HIDE; szCommandLine[0] = 0; GetModuleFileName(NULL, szPath, sizeof(szPath) - 1); sprintf(szCommandLine, ""%s"",szPath); if (0 == (CreateProcess(NULL, szCommandLine, NULL, NULL, TRUE, 0, NULL, NULL, &startup_info, &pi))) return -1; return pi.dwProcessId; // return PID of the started task. } #endif

Winboard Forum

about parallel search in toga

about parallel search in toga

Re: about parallel search in toga

Re: about parallel search in toga

Re: about parallel search in toga

Re: about parallel search in toga

Re: about parallel search in toga

Re: about parallel search in toga

Re: about parallel search in toga

Who is online