http://wbforum.vpittlik.org/viewtopic.php?t=5015&postdays=0&postorder=asc&topic_view=&start=39
magic tables can be shrunk with an additional indirection by the factor of four.
I guess he means something like
- Code: Select all
Pattern=RookMask[square]&(AllWPieces | AllBPieces);
Index= (Pattern*key[square])>>Shift[square];
Address = AddressRook[square][Index];
Attacks = AttacksRook[Address]
Has anybody done a performance comparison between this approach and the direct one?
Note that shrinking factor 8 is possible if one accepts double indexing for the last lookup:
- Code: Select all
Pattern=RookMask[square]&(AllWPieces | AllBPieces);
Index= (Pattern*key[square])>>Shift[square];
Address = AddressRook[square][Index];
Attacks = AttacksRook[square][Address]
because Address can be set up to range from 0 to at most 143 which fits in 8 bit.
Moreover, note that using the latter approach, several squares can share the same lookup array AddressRook[square] when the magic multipliers are chosen appropriately. That allows additional shinking of the AddressRook table.
For example, if x is a magic multiplier for h8, x<<1 should be for g8, x<<8 for h7 and x<<9 for g7. This only works because RookMask[g8]<<1 etc are subsets of RookMask[h8].
Alas, for the inner squares things are more difficult because there is no inclusion betwen the RookMask entries. But instead of using magic multipliers for subsets of say RookMask[d4], we might search for magic multipliers for subsets of say
- Code: Select all
RookMask[d4] | (RookMask[c4]<<1) | (RookMask[d3]<<8) |(RookMask[c3]<<9)
There are more than 144 possible Addresses now (in example above 5*3*5*3=225), but they might still fit in one byte.
The question is how large the lookup array for such a multiplier is. If it has 12 bit (like the mask) we have an even deal (one 12 bit lookup instead of four 10 bit), but if we find magic multipliers with 11 bit or even 10 bit, we get a smaller lookup table.
Greetings,
Onno
PS: Can somebody explain why people are only talking about L1 cache? Isn't it a problem that current lookup tables for rook magic are so huge that they even do not fit in L2 cache? Or is L2 cache used in a completely different way?