Winboard Forum

by **mridul** » 14 Dec 2004, 19:39

Congrats ! One down... x bugs to go

Sure keep the questions coming !

Mridul

by **Dann Corbit** » 14 Dec 2004, 19:49

Daniel Shawul wrote:After 4 days and 5 hours exactly the bug is
completely solved
It was just simply deleting a 5 letter word "static".
Yes this was the head ache.
I defined a temporary SQUARE tempSQ variable in my board class
which i use to exchange squares during make/unmake.
When two threads are making a move at similarly the same time
tempSQ is corrupted. But this happens very rarely and that's why
it was so hard for me to spot it.
No legal moves were generated , everything was fine.

I don't know how boundscheker or anyother debugging tool could have
helped me here. Because in this bug all the moves were legal and >95% of the time make move is right. And bugs are not reproducible.

Any bug hunting story which beats this one? I bet not

greetings from very very happy
Daniel

Speaking of static...

You should grep through your code for every instance of the static keyword. Every instance of it means that you cannot write to that object with more than one thread simultaneously.

The same thing goes for public symbols (which you can find by looking at the map or the class list in the IDE).

You can have as many reader threads as you want, but only one writer thread for any public object. That is one of the reasons it is very hard to start with an example program like TSCP or many of the other simple programs and end up eventually with multithreading. They are full of public symbols.

by **Anthony Cozzie** » 14 Dec 2004, 21:09

My favorite bug story actually occurred at work. I was tasked with porting some Linux code to Solaris. Now, the above code was multiprocess and it on exit killed all its child processes using kill(). Unfortunately, it also had a race condition that would occur on occaision, when it would execute a kill -1 instead of kill(childpid). Now on Linux, kill(-1) kills all of your child processes, so this was just fine and went undetected. On Solaris, kill(-1) kills ALL processes with your userid. So whenever this bug triggered I was completely kicked off the machine, ssh session and all

Your experience is why Vincent likes multiprocessor parallelism. Now you have to hunt through all your global/static variables and make sure ALL of them are thread safe, that is that they are all either A) read-only B) protected by mutexes C) otherwise protected by your design.

I just assumed you had done this already

On the bright side, that is one less mistake you have to make on the parallel learning curve. Are you going to play at CCT? I bet if you asked around you could find someone with a dual that would let you run on it (assuming you are finished by then).

anthony

by **Dann Corbit** » 15 Dec 2004, 00:35

I once did a port of some software tools from Windows to OS/2. Since the DEC task to task interface for the Windows API and the OS/2 API was identical, it was really simple. Took less than one hour.

Of course, the server crashed heinously left and right, as soon as multiple threads started up.

Fortunately, the DEC product was distributed with source code. I started looking at the source code, and each and every function had static variables. The commical part was that not a single static definition was needed -- auto would have been exactly the same in terms of operation when single threaded (and would have produced CORRECT operation when multithreaded).

The one hour job turned into two weeks. I basically threw away all of the DEC code (which also had clever things like company supplied hearder files with wrong levels of indirection) and retained A SINGLE ASSEMBLY LANGUAGE FUNCTION CALL. I did (of course) model after the original API, which was fine and already documented.

After that, it worked like a charm.

by **Daniel Shawul** » 15 Dec 2004, 05:38

I just assumed you had done this already On the bright side, that is one less mistake you have to make on the parallel learning curve. Are you going to play at CCT? I bet if you asked around you could find someone with a dual that would let you run on it (assuming you are finished by then).

Yes i checked all of my static variables. All of them are read only variables like piece square values. It just never occured to me at that time this temporary variable used for swapping needs to be defined in each board object.
I am not sure if i can find one who could do that,but if i do i will not use
the parallel version. I want to make sure everything is working as i expected it. Any way i will look for an operator.

Daniel

by **Tony van Roon-Werten** » 15 Dec 2004, 13:36

I have a stupid wuestion

Why would one use static for piece square tables rather than const ?

Tony

by **Daniel Shawul** » 15 Dec 2004, 13:42

i define them in my board class. When you have two board objects.like in case of parallel search,this data should not be duplicated. In my case it is
"static const". That will be only "const" only if I defined them outside the board class.
daniel

by **Tony van Roon-Werten** » 15 Dec 2004, 13:51

Hmm, I see.

That doesn't seem optimal on the Opterons, where you do want a copy of those tables, preferable in the memory "close" to the 2nd processor rather than that 2nd processor having to go to the remote memory of the first processor.

Is it ok on all other multiprocessor machines ?

Tony

by **Alessandro Scotti** » 15 Dec 2004, 14:30

Tony van Roon-Werten wrote:
Why would one use static for piece square tables rather than const ?

A possible reason could be initialization, which I do at startup in Kiwi (starting from a few "const" templates).

by **Tony van Roon-Werten** » 15 Dec 2004, 15:24

In that case, what is the advantage of using static rather than normal variables ? Does the compiler do some optimizations for statics ?

The only thing I understand about statics is that, when declared within a function, you have have shielded globals ie globals only accesable fom that function. Is there more to it ?

Tony

by **Dann Corbit** » 15 Dec 2004, 20:17

There is only a single instance of a static table in the program.
If auto were used, there would be a new instance for each function call.
Static variables are initialized at compile time. Auto variables are initialized at run time. If you have a table with 256 things in it, and a function is called a lot, it can have impact.
The storage can be important for recursive function calls.

You can get a copy of the C or C++ ANSI standard for $18 in PDF format (or that used to be the price a while back).

Related data from the C-FAQ:

1.30: What am I allowed to assume about the initial values of
variables and arrays which are not explicitly initialized?
If global variables start out as "zero", is that good enough
for null pointers and floating-point zeroes?

A: Uninitialized variables with "static" duration (that is, those
declared outside of functions, and those declared with the
storage class static), are guaranteed to start out as zero, just
as if the programmer had typed "= 0". Therefore, such variables
are implicitly initialized to the null pointer (of the correct
type; see also section 5) if they are pointers, and to 0.0 if
they are floating-point.

Variables with "automatic" duration (i.e. local variables
without the static storage class) start out containing garbage,
unless they are explicitly initialized. (Nothing useful can be
predicted about the garbage.)

These rules do apply to arrays and structures (termed
"aggregates"); arrays and structures are considered "variables"
as far as initialization is concerned.

Dynamically-allocated memory obtained with malloc() and
realloc() is likely to contain garbage, and must be initialized
by the calling program, as appropriate. Memory obtained with
calloc() is all-bits-0, but this is not necessarily useful for
pointer or floating-point values (see question 7.31, and section
5).

References: K&R1 Sec. 4.9 pp. 82-4; K&R2 Sec. 4.9 pp. 85-86; ISO
Sec. 6.5.7, Sec. 7.10.3.1, Sec. 7.10.5.3; H&S Sec. 4.2.8 pp.
72-3, Sec. 4.6 pp. 92-3, Sec. 4.6.2 pp. 94-5, Sec. 4.6.3 p. 96,
Sec. 16.1 p. 386.

7.5a: I have a function that is supposed to return a string, but when
it returns to its caller, the returned string is garbage.

A: Make sure that the pointed-to memory is properly allocated.
For example, make sure you have *not* done something like

char *itoa(int n)
{
char retbuf[20]; /* WRONG */
sprintf(retbuf, "%d", n);
return retbuf; /* WRONG */
}

One fix (which is imperfect, especially if the function in
question is called recursively, or if several of its return
values are needed simultaneously) would be to declare the return
buffer as

static char retbuf[20];

See also questions 7.5b, 12.21, and 20.1.

References: ISO Sec. 6.1.2.4.

Some relevant points from the ANSI C Standard:

5.1.2 Execution environments
1 Tw o execution environments are defined: freestanding and hosted. In both cases, program startup occurs when a designated C function is called by the execution environment. All objects with static storage duration shall be initialized (set to their initial values) before program startup. The manner and timing of such initialization are otherwise unspecified. Program termination returns control to the execution environment.
Forward references: storage durations of objects (6.2.4), initialization (6.7.8).

6.2.4 Storage durations of objects
1 An object has a storage duration that determines its lifetime. There are three storage durations: static, automatic, and allocated. Allocated storage is described in 7.20.3.
2 The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address,25) and retains its last-stored value throughout its lifetime.26) If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime.
3 An object whose identifier is declared with external or internal linkage, or with the storage-class specifier static has static storage duration. Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup.
4 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic storage duration.

10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
? if it has pointer type, it is initialized to a null pointer;
? if it has arithmetic type, it is initialized to (positive or unsigned) zero;
? if it is an aggregate, every member is initialized (recursively) according to these rules;
? if it is a union, the first named member is initialized (recursively) according to these rules.

by **Tony van Roon-Werten** » 16 Dec 2004, 07:29

Thanks !

Tony

by **Daniel Shawul** » 16 Dec 2004, 13:46

I have to keep some threads spinning until
a search is assigned to them. This may be too short(if search has already started) or too long (if it is the opponent's turn to move)
This is what i use for idle looping.

while(sleeping) {
Sleep(0);
}

In the second case,the opponent may be disadvantaged if it sets low
priority for threads.Increasing the sleep value to 1 or just killing the thread and restarting it againg when it is our turn solves this problem .

the first case has problems because I create threads at the same priority and the above code might be inefficient. Is there a better way to suspend execution of a thread?
When i want to start a search , i just set the value of sleeping to 1
so that the thread is out of the idle waiting and then search. Is this reasonable? What do you guys do to idle wait and start a thread?
daniel

by **Anthony Cozzie** » 16 Dec 2004, 23:53

I think its quite unprofessional to have the CPUs doing all sorts of work when the user hasn't requested anything, so I use some events and critical sections to give the worker CPUs a non-spinning wait state. Basically a CPU in Zappa has three states:

1) waiting for a start_search_event (wait_for_single_object)

2) spinning in an idle loop

3) searching

I would paste the code, but figuring it out by yourself will be a good exercise

The MSDN documentation is also very good. If you need more help though, just ask.

anthony

by **Daniel Shawul** » 17 Dec 2004, 14:12

Hi Anthony

I solved the problem by Sleep(10) when it is the opponent's
turn.Shouldn't do no harm i guess.

I found out today that my algorthm of splitting is wrong.
What i did allocates processors randomly, eventhoug it does it
to prefered "all nodes", it doesn't prefer shallower nodes over deeper nodes.I droped that now
So i am back to PVSplit ,where i need to engage all processors at a given node. I sent Leo a version to test today.
But i am frustrated to read in CCC archives that GCP did an implementaion of PVS and got only a speed up of 1.3. Which is very disappointing as compared to the one mentioned on the DTS article
1.8 for PVS
1.9 for EPVS
If GCP got only 1.3 you can guess much i am going to get :wink:

So it seems i have to implement something better. I 'm going to try EPVS,
but couldn't find the original paper. I also can't find YBW paper. They are cited at lots of places,but not a single download is available.
If you have them please send it to me.
daniel

by **Anthony Cozzie** » 17 Dec 2004, 15:53

OK, I feel you are getting a little confused here. IMHO, there are two parts to writing a multiprocessor chess program: writing a framework to split, and deciding where to split. Of course, the decision is affected by the engine's ability to split in various places.

PVS, EPVS, and DTS are splitting frameworks. PVS simply says: all processors work together at a single node. Now, let us think about the problems of PVS. Suppose you are searching a node with 20 processors, and you come to the last move. In PVS, one CPU gets it, and the other nineteen have to wait until the other CPU finishes. Now, suppose said node is at 10 ply. Waiting for the CPU to finish might take a few seconds - major oops. EPVS simply says: we have an idle CPU. Therefore, abort this node, go back two plies, and start again where we (hopefully) have more work, counting on the transposition table to save most of our work. Note the simplicity of the entire approach: all cpus congregate at a SINGLE node. This makes debugging way easier

However, IMHO, no algorithm that keeps all CPUs at one node will ever have a good speedup.

The real point here is that it is possible to combine YBW with any of the above frameworks. In fact, Zappa is basically DTS+YBW. The other thing to remember is that null move makes things harder then they had it back in the 80s, because it makes it harder to determine what is an alpha node and what is a beta node.

The problem with your algorithm lies in the wait_for_children() part. You have two choices: A) abort your slaves and take over one of their moves, or B) do what Bob does and send the waiting CPU back to the idle list. B is better but a little tricky as well, because the waiting CPU has to be the one to return from the node in the end.

anthony[/code]

Winboard Forum

Debugging techniques

Re: Debugging techniques

Re: The bug of the century is solved!

Re: Debugging techniques

Threaded surprises

Re: Debugging techniques

Re: Debugging techniques

Re: Debugging techniques

Re: Debugging techniques

Re: Debugging techniques

Re: Debugging techniques

Static variables

Re: Debugging techniques

Re: Debugging techniques

Re: Debugging techniques

Re: Debugging techniques

Re: Debugging techniques

Who is online