Winboard Forum

by **Pradu** » 24 Jul 2005, 20:16

An obvious checking key in a Hash Table entry is the entire key (assuming 64 bit hash keys). But are there better ways to make the checking key in the entry smaller but still reliable for the index in the hash table?

One idea of mine is to store the Most Significant 8 bits of the hash key as the checking keys (This is assuming a table with a prime number of entries and the modulo (%) operator for hashing keys to an index in the table). By the way, I have not tried this or tested it since I have not implemented a hash table in my rewritten engine yet...

I appreciate comments on this method or any different methods for reducing the size of the hash checking key in a Hash Table entry.

by **Chan Rasjid** » 24 Jul 2005, 21:57

I doubt the hashkey/ checkkey can be reduced.

It is simple probability consideration to avoid collision that requires the checkkey size to have as many bits as posssible. 8 bits for checkkey will
have give 1 : 2^ 8 and not workable. Even 16 bits may not be enough.

Rasjid

by **Pradu** » 24 Jul 2005, 22:03

But different hashkeys will map to different indexes in the hash table. So having an check key being sufficiently unique to a index in a hash table should be enough right?

by **Pallav Nawani** » 25 Jul 2005, 04:28

Hi,

There is a thread in this forum where Sune Fischer has posted some statistics regarding the number of bits in the hash key and the hash collisions. Look for that thread.

Pallav

by **Pradu** » 25 Jul 2005, 04:41

I believe this is the topic:
http://wbforum.volker-pittlik.name/view ... 8&start=20

His test does not show anything about hash key collision at particular indecies of the hash table. Also it is not clear whether he uses a % operator to hash the key to its index. If I'm wrong please correct me...

I just want to see what other people do to lessen the size of the hash key, thats all.

by **Sune Fischer** » 25 Jul 2005, 07:19

I used a &-operator otherwise the test with how many bits remain identical would have been hard to do.

If you don't mind potentially getting a collision or two an hour (this should be entirely safe according to among other Bob Hyatt), then you can throw away about 16 bits without much worry.
For example you can store half the key in the table and use the other half of the key for indicing. I've been doing that for years now, I have not so far seen any problems with it.

-S

Edit:
If you are apprehensive, then write it with a macro so you can quickly enable 64 bit hashing. Then run some long tests to see if the node count differs.

by **Pradu** » 25 Jul 2005, 19:37

Allright, I'll implement a test for hashtables of a prime size with keys hashed by % when I have implemented it in my engine. Hopefully I'm not too lazy to post results here.

by **smcracraft** » 19 Jan 2006, 22:32

I use a 64-bit hashkey using the zobrist xor method.

But I do not keep a "check key" (say of 64-bits) in the
hash structure to compare further with.

So my question is - should one do that and if so what
should the check be calculated from? I am using ALL
bits for the 64-bit hashkey (not a partial mask that
would permit me to use other parts as the checkkey.)

Thanks,

Stuart

P.S. My regular hash table statstics are about 30%
(100*hits/probes). My pawn hash table, using same
method, same key, is about 90-99% (100*phits/pprobes).

by **H.G.Muller** » 19 Jan 2006, 23:40

It is easy to calculate the probabiity for a mixup. If you have 16 million entries in your hash table, the probability that two different positions will hash to the same entry is never below 1 in 16 million, no matter how perfectly randomized your hashing scheme is. (It should not matter much if you use modulo or shifting/ masking if your initial hash key has good low-order random bits.) This seems like a small number, but you shoud compare it to the number of pairs of positions in the table, not against the number of positions itself. So if you store 6000 positions in this table, there are 6000 x 6000 / 2 = 18 million pairs, so on the average there will already be one 'collision'. Even though the table is stil 99,96% empty!

If you add an 8-bit hash lock for checking, you will be able to intercept 255 out of every 256 such collisions, i.e. reduce the collision probability 256 fold. So you can afford 256 times as many pairs, but that is only 16 times as many positions, or a measly 96.000. Not nearly good enough...

This is why you need to store a much larger hash lock in the table. If you want to be able to fill up the table to a reasonable degree, i.e. nearly 16M entries, his means 16M x 16M / 2 = 128 trillion pairs (= 2^47). Since you don't want to have more than one mixup in a thousand moves (that might still mean one mistake every 10 games!), the error probability must be smaller than 1 in 2^57. The size of the hash table (16M = 2^24) takes care of part of this, but the hash lock has to take care of the rest (2^(57-24) = 2^33) The hash lock for a table this size would thus have to be at least 33 bits to have the 1 in 1000 collision rate. With 32 bits you will have 1 collision every 500 moves (if the table is always completely filled up). Not every collision results in a wrong move choice, though.

In the case above it is important that the bits used in the hash lock are not correlated with those from which you derive the hash table index, so you can not start with a 32-bit Zobrist key and derive both the address and the hash lock from it. I use a 64 bit Zobrist key (or, equivalently, two different 32 bit Zobrist keys, that I derive from the same set of random numbers by swapping the white and black piece-square codes). One set of 32 bits I use as the hash lock, while the table index is derived from the other 32 bits. This is really just a little bit too small for a hash tabe this size, I will occasionally see an error. Storing a 40-bit hash lock would be better (but slower...).

by **Dann Corbit** » 20 Jan 2006, 00:32

Pradu wrote:An obvious checking key in a Hash Table entry is the entire key (assuming 64 bit hash keys). But are there better ways to make the checking key in the entry smaller but still reliable for the index in the hash table?

One idea of mine is to store the Most Significant 8 bits of the hash key as the checking keys (This is assuming a table with a prime number of entries and the modulo (%) operator for hashing keys to an index in the table). By the way, I have not tried this or tested it since I have not implemented a hash table in my rewritten engine yet...

I appreciate comments on this method or any different methods for reducing the size of the hash checking key in a Hash Table entry.

The standard method here, for a K bit hash key, is to use M bits for the hash table address and K-M bits for the stored verification key.

Example:
If you have a 24 bit hash table address, then you store the other 40 bits in your hash table. In such a case, you can detect every collision 100%

Most people don't bother with such precision. They figure that one collision in 4 billion or so is something that they can live with. It won't affect the overall search very much and it saves the test for the verification hash.

by **H.G.Muller** » 20 Jan 2006, 09:27

In my sample calculation above I forgot to mention the effect of re-hashing. If a a position can be stored in (say) 8 places in the hash table before you give up, the contribution of the address to the discriminatory power drops by a factor 8 (e.g. from 16M to 2M), or three bits (21 in stead of 24).

The most tricky part is to estimate the probability that a mistake will hurt you. If you make a bad move (one that was not the pest in that node) worse, there is no effect. If you make the best move worse, the score drops. But probably not much, because if it drops too much the second best move would take over. If you make a move better, the score of that node might go up, because it might have been the best move, or if not become the best move because of this. But that means that the score for the opponent on the level below it goes down, and there the second-best move also limits the damage this can do .

So only in cases where the second-best move was very much worse than the best, and the collision corrupts the best score, you suffer. But even if in every node there was only one move that stuck out far above the rest, the probability that the collision would occur in this good move is about 1 in 32 (the average branching ratio). But such nodes are probably rare.

I only once consciously witnessed a bad collision, when I was still deriving the address and the hash lock from the same 32 bit key in a 1M table: in the KPK end-game it could see queening at 16 ply, but at 17 ply it disappeared, before it came back at 18 ply or higher. I thought this strange enough to trace exactly what caused it, and it turned out that the position in which the queening move should be done was colliding with a draw position. But in that (bad) scheme the collision probability was 1 in 2^32, which leads to 1 collision with ~100.000 entries, (and 4 collissions with 200.000, etc.) and even for a three-men end-game you have about 500.000 in the table (so 25 collisions...)

by **Gian-Carlo Pascutto** » 20 Jan 2006, 10:43

Collissions in hash: not worth worrying about for a reasonable key length.

HOWEVER: if you use the zobrist key for anything else, watch out. Consider what happens if you get a collision in an opening book, for example.

by **H.G.Muller** » 20 Jan 2006, 13:39

But the question of course is: "what is a reasonable length?". The calculation shows that for hash-table sizes that we can nowadays afford (16M+ entries) a 32-bit hash lock does make you worry, even if you don't derive anything else from it. But to go beyond that on machines with a 32-bit word length (still the most current at the moment) does pay some performance penalty.

Even on machines with 64-bit word length using a 64 bit lock is not entirely free, because it it might force you to a larger than necessary size of the entry to avoid alignment problems (e.g. 16 bytes in stead of 12). And larger entries means that a search (wich is unavoidable if the table is nearly full) spans more cache lines, and thus requires more memory cycles.

Winboard Forum

Good checking keys in a Hash Table entry

Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Re: Good checking keys in a Hash Table entry

Who is online