by H.G.Muller » 19 Jan 2006, 23:40
It is easy to calculate the probabiity for a mixup. If you have 16 million entries in your hash table, the probability that two different positions will hash to the same entry is never below 1 in 16 million, no matter how perfectly randomized your hashing scheme is. (It should not matter much if you use modulo or shifting/ masking if your initial hash key has good low-order random bits.) This seems like a small number, but you shoud compare it to the number of pairs of positions in the table, not against the number of positions itself. So if you store 6000 positions in this table, there are 6000 x 6000 / 2 = 18 million pairs, so on the average there will already be one 'collision'. Even though the table is stil 99,96% empty!
If you add an 8-bit hash lock for checking, you will be able to intercept 255 out of every 256 such collisions, i.e. reduce the collision probability 256 fold. So you can afford 256 times as many pairs, but that is only 16 times as many positions, or a measly 96.000. Not nearly good enough...
This is why you need to store a much larger hash lock in the table. If you want to be able to fill up the table to a reasonable degree, i.e. nearly 16M entries, his means 16M x 16M / 2 = 128 trillion pairs (= 2^47). Since you don't want to have more than one mixup in a thousand moves (that might still mean one mistake every 10 games!), the error probability must be smaller than 1 in 2^57. The size of the hash table (16M = 2^24) takes care of part of this, but the hash lock has to take care of the rest (2^(57-24) = 2^33) The hash lock for a table this size would thus have to be at least 33 bits to have the 1 in 1000 collision rate. With 32 bits you will have 1 collision every 500 moves (if the table is always completely filled up). Not every collision results in a wrong move choice, though.
In the case above it is important that the bits used in the hash lock are not correlated with those from which you derive the hash table index, so you can not start with a 32-bit Zobrist key and derive both the address and the hash lock from it. I use a 64 bit Zobrist key (or, equivalently, two different 32 bit Zobrist keys, that I derive from the same set of random numbers by swapping the white and black piece-square codes). One set of 32 bits I use as the hash lock, while the table index is derived from the other 32 bits. This is really just a little bit too small for a hash tabe this size, I will occasionally see an error. Storing a 40-bit hash lock would be better (but slower...).