Winboard Forum

Posted: **17 Feb 2007, 13:38**

As Gerd Isenberg wrote in an earlier post
http://wbforum.vpittlik.org/viewtopic.php?t=5015&postdays=0&postorder=asc&topic_view=&start=39
magic tables can be shrunk with an additional indirection by the factor of four.

I guess he means something like

Code: Select all: Pattern=RookMask[square]&(AllWPieces | AllBPieces); Index= (Pattern*key[square])>>Shift[square]; Address = AddressRook[square][Index]; Attacks = AttacksRook[Address]

Has anybody done a performance comparison between this approach and the direct one?

Note that shrinking factor 8 is possible if one accepts double indexing for the last lookup:

Code: Select all: Pattern=RookMask[square]&(AllWPieces | AllBPieces); Index= (Pattern*key[square])>>Shift[square]; Address = AddressRook[square][Index]; Attacks = AttacksRook[square][Address]

because Address can be set up to range from 0 to at most 143 which fits in 8 bit.

Moreover, note that using the latter approach, several squares can share the same lookup array AddressRook[square] when the magic multipliers are chosen appropriately. That allows additional shinking of the AddressRook table.

For example, if x is a magic multiplier for h8, x<<1 should be for g8, x<<8 for h7 and x<<9 for g7. This only works because RookMask[g8]<<1 etc are subsets of RookMask[h8].

Alas, for the inner squares things are more difficult because there is no inclusion betwen the RookMask entries. But instead of using magic multipliers for subsets of say RookMask[d4], we might search for magic multipliers for subsets of say

Code: Select all: RookMask[d4] | (RookMask[c4]<<1) | (RookMask[d3]<<8) |(RookMask[c3]<<9)

There are more than 144 possible Addresses now (in example above 5*3*5*3=225), but they might still fit in one byte.

The question is how large the lookup array for such a multiplier is. If it has 12 bit (like the mask) we have an even deal (one 12 bit lookup instead of four 10 bit), but if we find magic multipliers with 11 bit or even 10 bit, we get a smaller lookup table.

Greetings,
Onno

PS: Can somebody explain why people are only talking about L1 cache? Isn't it a problem that current lookup tables for rook magic are so huge that they even do not fit in L2 cache? Or is L2 cache used in a completely different way?

Posted: **17 Feb 2007, 21:05**

I tried Pradu's PEFRECT_HASH magic with additional indirection and to my surprise it was a lot slower.

http://www.vpittlik.org/wbforum/viewtopic.php?t=5452

Likely one huge table access is not so worse than two dependent loads from two semi-huge tables.

L1 is much faster than L2. Not quite sure actually, i think L2 is > 10 cycles on recent cpus.

Winboard Forum

Shrinking magic tables by additional indirection

Shrinking magic tables by additional indirection

Re: Shrinking magic tables by additional indirection