I don't expect overapping bits to be a problem, if you xor the bits: the xor of two random uncorelated bitpatterns is still purel random (i.e. homogneously distributed). This remains true even if only on of the pattersn is random. Of course many random implementations are no good at all, but do have strong correlations between succesively returned values.
In my minimalistic chess program Micro-Max I tried to save some space in the piece-square tables for the Zobrist key (to increase L1-cache hit rate) by using a 12x64 array T of random
bytes (well, actually 8x128, but never mind that), reading out a group of consecutive bytes as an integer:
- Code: Select all
key ^= *(int*)&T[piece][square];
So in fact the random codes for the various piece-square combinations are not independent, but a subpattern of 8 bits that occurs in one, occurs in another in a different place.
I tested if this resulted in an increased rate of misidentificatons, by randomly generating and tabulating positions together with their calculated Zobrist key, and counting how often two different positions gave the same Zobrist key. For 4-byte keys this resulted in ~1 collision in 100.000 positions (growing quadraticaly, i.e. 1 collision per 4 billion pairs), just as expected (and observed) for fully independent random numbers. So apparently, this kind of overlap between the random numbers has no effect at all on the miss-identification rate.
I also tried a hash key constructed as the sum over the product of the square number and a big prime (~60 million) specific for the piece occupying the square. (So that only a table of 32 primes is required.) This also did not have a miss-identiication rate worse than the theoretical optimum for a key of that size, but it had the disadvantage of mapping positions that swapped pieces of identical type to different keys. (This does not lead to errors, but is less efficient. Because it does not happen very often that such states occur in the same tree, the efficiency diffence is hardly noticeable in practice).
Anyway, to avoid inability to exactly reproduce the thought process that led to a certain move (by starting any move with an empty hash table, and recording all random seeds). If you don't, you sould simply accept that nothing can be learned from the way the program played. At the Dutch open my own engine started to play absolutely crazy moves in one round (it refused to move pieces that came under attack). Despite the fact that I set up everything to make it act reproducible, I could never reproduce the behavior later, so I can be pretty sure it must have been due to hardware error. (Not unlikely, sice my cooler fan was stuck, making the computer overheat all the time...)