Page 1 of 2

Bug hunting or How many bits is enough?

PostPosted: 12 Jun 2008, 22:51
by Michael Sherwin
The last public release of my program RomiChess (p3k) has been barraged by so many inexplicable losses that I have gone on a bug hunt to try to find out why.

Only the high 32 bits of the hash signiture was being stored in the hash record. This seemed enough as storing the full 64 bits made no difference in performance.

So as an experiment I just added another 32 bits that is just used as the key and am storing the full original 64 bit signature/key as just the signature.

Only 50 games have been played, However, this is the best result for those 50 games that I have ever had.

Now I am wondering, how many good ideas tested bad, because they stumbled into more games that had hash errors.

And I am wondering just how many bits are really enough. Or maybe my hash value generation, for only 64 bits, is not good.

Also, now that I am using 96 bits, can I just use the hash move with out verification that the move exist and that it is legal. This could mean a big increase in node rate. Especially if it is extended to the qsearch().

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 00:21
by Dann Corbit
Michael Sherwin wrote:The last public release of my program RomiChess (p3k) has been barraged by so many inexplicable losses that I have gone on a bug hunt to try to find out why.

Only the high 32 bits of the hash signiture was being stored in the hash record. This seemed enough as storing the full 64 bits made no difference in performance.

So as an experiment I just added another 32 bits that is just used as the key and am storing the full original 64 bit signature/key as just the signature.

Only 50 games have been played, However, this is the best result for those 50 games that I have ever had.

Now I am wondering, how many good ideas tested bad, because they stumbled into more games that had hash errors.

And I am wondering just how many bits are really enough. Or maybe my hash value generation, for only 64 bits, is not good.

Also, now that I am using 96 bits, can I just use the hash move with out verification that the move exist and that it is legal. This could mean a big increase in node rate. Especially if it is extended to the qsearch().


If you use the low K bits of your hash for the hash table address and you store the upper 32 bits in the hash table, then you are actually using 32+k bits of your hash value for verification. For example, if you have 4 million hash table entries then you have 24+32 = 56 bits (because the low 24 bits are totally redundant -> only a value with those 24 bits matching would have been stored in that slot). The odds of a hash collision are therefore one in 2^56=72,057,594,037,927,936
If you search 10 million NPS then you will have one collision every 7205759403.8 seconds = one collision every 83400 days = one every 228 years.

I guess that something else was the problem.

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 01:33
by Michael Sherwin
That is what I think/thought! :? I am still looking. Here bug bug bug bug bug! :mrgreen:

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 05:43
by Tony Thomas
Michael Sherwin wrote:That is what I think/thought! :? I am still looking. Here bug bug bug bug bug! :mrgreen:


Find that little critter and have goat sit on it..OK, dont let the goat sit on it, the bug might actually enjoy it..Just squash it with a swatter..

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 15:39
by P.K. Mukhopadhyay
How does the supposedly buggy version fare against the older stable version?

P.K.Mukhopadhyay

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 16:52
by Teemu Pudas
Dann Corbit wrote:If you use the low K bits of your hash for the hash table address and you store the upper 32 bits in the hash table, then you are actually using 32+k bits of your hash value for verification. For example, if you have 4 million hash table entries then you have 24+32 = 56 bits (because the low 24 bits are totally redundant -> only a value with those 24 bits matching would have been stored in that slot). The odds of a hash collision are therefore one in 2^56=72,057,594,037,927,936
If you search 10 million NPS then you will have one collision every 7205759403.8 seconds = one collision every 83400 days = one every 228 years.


Your math is faulty. If the table is full (when isn't it?) every probe is guaranteed to be a k-bit collision. Mind you, one in 4 billion is of course still pretty insignificant.

OTOH if you stored the same 32 bits you used for the index, you'd have a pretty big bug.

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 17:05
by Ron Murawski
Dann is 100% correct. But that presupposes that your Zobrist random numbers are good. I had problems similar to yours when I used 2 successive 32-bit pseudo-random numbers to make one 64-bit number. The collision rate was extremely high. My original PRNG used the linear congruential method, where it is a known fact that the low-order bits are not as random as the high-order bits. (Art of Computer Programming, Volume 2, 3rd Edition, Donald Knuth, Section 3.6, item vi, p185) If you read the pseudo-random literature you will find that joining 2 successive numbers together to make a single larger one is also a mathematically weak technique.

You are better off using a list of empirically-tested known-good numbers (like in Crafty) or using a 64-bit pseudo-random number generator that is *not* generated through the linear congruential method.

At the moment I'm using ISAAC-64: http://burtleburtle.net/bob/rand/isaacafa.html
You'll need to scroll down the page a bit to find the 64-bit version.

Ron

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 17:16
by P.K. Mukhopadhyay
Teemu Pudas wrote:
Dann Corbit wrote:If you use the low K bits of your hash for the hash table address and you store the upper 32 bits in the hash table, then you are actually using 32+k bits of your hash value for verification. For example, if you have 4 million hash table entries then you have 24+32 = 56 bits (because the low 24 bits are totally redundant -> only a value with those 24 bits matching would have been stored in that slot). The odds of a hash collision are therefore one in 2^56=72,057,594,037,927,936
If you search 10 million NPS then you will have one collision every 7205759403.8 seconds = one collision every 83400 days = one every 228 years.


Your math is faulty. If the table is full (when isn't it?) every probe is guaranteed to be a k-bit collision. Mind you, one in 4 billion is of course still pretty insignificant.
.


What matters is that probability of a 32+k bit collision is very small - one in 2^(32+k).

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 18:49
by Teemu Pudas
P.K. Mukhopadhyay wrote:probability of a 32+k bit collision is very small - one in 2^(32+k).


No, it isn't. It's one in 2^32: the k bits collide every time you probe. 2^(32+k) is the probability that two specific positions collide; why would anyone care which one of the 4 million hash entries is the one that gets a collision?

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 19:48
by Dann Corbit
Teemu Pudas wrote:
P.K. Mukhopadhyay wrote:probability of a 32+k bit collision is very small - one in 2^(32+k).


No, it isn't. It's one in 2^32: the k bits collide every time you probe. 2^(32+k) is the probability that two specific positions collide; why would anyone care which one of the 4 million hash entries is the one that gets a collision?


The correct analysis is somewhere in between, I think.

In order for both the upper and lower bits to match, the probability *is* one in 2^(32+k), but we do force all entries with the same K matching lower bits into the *same* slot. So there may be some factor akin to the birthday paradox at work here.

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 20:01
by P.K. Mukhopadhyay
Teemu Pudas wrote:No, it isn't. It's one in 2^32: the k bits collide every time you probe. {One in} 2^(32+k) is the probability that two specific positions collide; why would anyone care which one of the 4 million hash entries is the one that gets a collision?


You are right.

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 21:27
by Michael Sherwin
P.K. Mukhopadhyay wrote:How does the supposedly buggy version fare against the older stable version?

P.K.Mukhopadhyay


Beats it up! But, looses stupidly sometimes. The new version does better against stronger engines than the old version, however, it does worse against weaker engines! This just screams, BUG!

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 21:32
by Michael Sherwin
Ron Murawski wrote:Dann is 100% correct. But that presupposes that your Zobrist random numbers are good. I had problems similar to yours when I used 2 successive 32-bit pseudo-random numbers to make one 64-bit number. The collision rate was extremely high. My original PRNG used the linear congruential method, where it is a known fact that the low-order bits are not as random as the high-order bits. (Art of Computer Programming, Volume 2, 3rd Edition, Donald Knuth, Section 3.6, item vi, p185) If you read the pseudo-random literature you will find that joining 2 successive numbers together to make a single larger one is also a mathematically weak technique.

You are better off using a list of empirically-tested known-good numbers (like in Crafty) or using a 64-bit pseudo-random number generator that is *not* generated through the linear congruential method.

At the moment I'm using ISAAC-64: http://burtleburtle.net/bob/rand/isaacafa.html
You'll need to scroll down the page a bit to find the 64-bit version.

Ron


Hi Ron,

In RomiChess I generate the hash values randomly, bit by bit, one bit at a time with the built in rand() function. Any thoughts?

Re: Bug hunting or How many bits is enough?

PostPosted: 13 Jun 2008, 22:43
by Teemu Pudas
Michael Sherwin wrote:In RomiChess I generate the hash values randomly, bit by bit, one bit at a time with the built in rand() function. Any thoughts?


Use a better generator and don't generate them bit by bit!

If you're doing rand() & 1, your numbers are most likely busted, as many rand() implementations use the linear congruential method...

Re: Bug hunting or How many bits is enough?

PostPosted: 14 Jun 2008, 03:20
by Michael Sherwin
Teemu Pudas wrote:
Michael Sherwin wrote:In RomiChess I generate the hash values randomly, bit by bit, one bit at a time with the built in rand() function. Any thoughts?


Use a better generator and don't generate them bit by bit!

If you're doing rand() & 1, your numbers are most likely busted, as many rand() implementations use the linear congruential method...


Thanks! :D I will make the change.

Re: Bug hunting or How many bits is enough?

PostPosted: 14 Jun 2008, 06:13
by Ron Murawski
Michael Sherwin wrote:
Ron Murawski wrote:Dann is 100% correct. But that presupposes that your Zobrist random numbers are good. I had problems similar to yours when I used 2 successive 32-bit pseudo-random numbers to make one 64-bit number. The collision rate was extremely high. My original PRNG used the linear congruential method, where it is a known fact that the low-order bits are not as random as the high-order bits. (Art of Computer Programming, Volume 2, 3rd Edition, Donald Knuth, Section 3.6, item vi, p185) If you read the pseudo-random literature you will find that joining 2 successive numbers together to make a single larger one is also a mathematically weak technique.

You are better off using a list of empirically-tested known-good numbers (like in Crafty) or using a 64-bit pseudo-random number generator that is *not* generated through the linear congruential method.

At the moment I'm using ISAAC-64: http://burtleburtle.net/bob/rand/isaacafa.html
You'll need to scroll down the page a bit to find the 64-bit version.

Ron


Hi Ron,

In RomiChess I generate the hash values randomly, bit by bit, one bit at a time with the built in rand() function. Any thoughts?


Hi Mike,

First of all, never use your compiler's rand() function. It is written to win benchmark tests. Quickly generated numbers are not the same thing as good numbers. Besides, good random numbers for one application may be terrible ones for other applications. There's no such thing as ideal numbers, only good ones for the task at hand. For Zobrist hashing you want good local spread. To avoid collisions we want to map very similar chess positions into very dissimilar Zobrist numbers. If the numbers are truly random, this will happen automatically.

If it was bad for me joining up 2 32-bit numbers, then for you, joining up 64 1-bit numbers is much, much worse. If you are keeping the least significant bit, this is the very worst method in the whole world! :(

Follow Teemu's advice and use a good 64-bit PRNG. Or use numbers that have been tested and determined to be good from an engine like Crafty.

It seems that Teemu and I are 'random' guys. There's another school of thought that wants to measure Hamming distances between the numbers. The larger the distances, the better the 'Hamming' guys like the numbers. Both methods seem to work. The actual method is to find the two numbers in your Zobrist number table with the smallest Hamming distance. This is the value you want to maximize.

Here's an idea: test your random numbers for Hamming distance. You need 64x6x2 of them (squares x pieces x sides) plus some for castling rights and ep. Let's estimate it at 800. Keep generating random sequences of length 800 until the Hamming distances are maximized. These are the numbers to use! Maybe this is the best of both worlds. I've never tried doing this.

What is the Hamming results from your original bad generator? It may very well be zero!

Ron

Re: Bug hunting or How many bits is enough?

PostPosted: 14 Jun 2008, 11:12
by P.K. Mukhopadhyay
Ron Murawski wrote: There's another school of thought that wants to measure Hamming distances between the numbers. The larger the distances, the better the 'Hamming' guys like the numbers. Both methods seem to work. The actual method is to find the two numbers in your Zobrist number table with the smallest Hamming distance. This is the value you want to maximize.
Ron


What Hamming distance/bits is considered satisfactory? If we consider three numbers A, /A and A^/A(i.e., all 1's) the minimum distance can be 50% of bits and yet they are related and useless for hashing.

Re: Bug hunting or How many bits is enough?

PostPosted: 16 Jun 2008, 04:43
by Ron Murawski
P.K. Mukhopadhyay wrote:
Ron Murawski wrote: There's another school of thought that wants to measure Hamming distances between the numbers. The larger the distances, the better the 'Hamming' guys like the numbers. Both methods seem to work. The actual method is to find the two numbers in your Zobrist number table with the smallest Hamming distance. This is the value you want to maximize.
Ron


What Hamming distance/bits is considered satisfactory? If we consider three numbers A, /A and A^/A(i.e., all 1's) the minimum distance can be 50% of bits and yet they are related and useless for hashing.


For Hamming distance, just work with 2 numbers at a time. Compute the Hamming distance between A and B. Then A and C, A and D, etc. Then compare B to C, B to D, etc. then C to D etc. etc. Save the smallest number as your minimum Hamming distance. I suppose it's a good idea to remember the average Hamming distance as well. If both of those are maximized, that's what we want.

I'm not a 'Hamming' guy. I strongly feel that a good sequence of random numbers will do the job. I think that a Hamming distance check might be a good idea, though.

I would think that a minimum Hamming distance of 1 bit is too low. Maybe 3 or 4 bits would be sufficient? I'm not sure of the theoretical maximum distance based on the quantity of 64-bit numbers that are needed. Perhaps a Hamming guy can make a suggestion as to what minimum value is considered acceptable.

There has been discussion of this here at the Winboard Forum (or was it at CCC?) a couple of years ago. Bob Hyatt knows about this. I remember Sune Fischer in a heated discussion about random vs Hamming methods.

Although I trust a good pseudo-random sequence over a human-picked sequence, I think either approach can work. But my opinion is based on intuition rather than on math.

Ron

Re: Bug hunting or How many bits is enough?

PostPosted: 16 Jun 2008, 14:03
by Michael Sherwin
Michael Sherwin wrote:
Teemu Pudas wrote:
Michael Sherwin wrote:In RomiChess I generate the hash values randomly, bit by bit, one bit at a time with the built in rand() function. Any thoughts?


Use a better generator and don't generate them bit by bit!

If you're doing rand() & 1, your numbers are most likely busted, as many rand() implementations use the linear congruential method...


Thanks! :D I will make the change.


I made the change and am running some test. I have already noticed that Romi plays different moves. Matter of fact, Romi plays different moves every time that the randomization seed is changed. Does this sound right? Not to me! :?

Edit: started from fixed positions, i.e. no book with random selection used.

Re: Bug hunting or How many bits is enough?

PostPosted: 16 Jun 2008, 14:36
by Teemu Pudas
Doesn't sound entirely wrong. Previously entry A mapped to the same location as entries B, C and D (etc), but now it has to fight against being replaced by positions B', C' and D'. Depending on the overload factor, this leads to completely different hash contents.

How big was the table?