This stuff is really fascinating. I just can't help wasting time on it. (I already have a very fast 0x88/mailbox-based move generator, and it is about the only thing I have, so I should really start working on search and evaluation... But for a mathemagician this stuff is almost irrisistable! )
I have a pretty good idea why minimal-bit magics aren't possible even with redundant bits. XORing the redundancy bits for these sparse bitsets is really the same as addition, so setting redundancy bits is equivalent to adding a constant to the product. This can already be acheived quite effectively by havibng only a single redundancy bit that is always set. Without any computational overhead you could always leave the bit for one of the corner squares set (in both the masks and the AllPieces board). If you take a1 for that (the lsb!), it can be moved into the region of the key bits through the high bits of the magic, independent of anything else.
But it doesn't help much. The Bishop occupancy set has upto 9 bits, (in the center), and the Rook occupancy set upto 12 bits (for the corners). To make keys smaller than this, the magic multiplication would have to provide not only shuffling and shifting of bits to the high end of the bitboard. It would also have to do some sort of compression, by conveniently mapping occupancies that result in the same attack set onto the same key.
This now, can't be done. Multiplication with a constant (magic or otherwise) is a linear process, so it basically produces a linear combination of the occupancy bits occ[k]:
product = SUM w[k]*occ[k] + offset
Even in the ideal case where we could set the weights w[k] totally independently (because the occ[k] bits were spread out over the word farther apart than the key length), the weight of one bit is always the same, and not dependent on the presence or absence of other bits. (i.e. there are no terms like occ[k]*occ[m] contributing to the product). Rounding (clipping off the low bits) can introduce a very mild non-linear interaction between the bits (e.g. if they are both 1 the sum just exceeds the boundary between two 'quantization levels', but if only one or non are set, it is just below it). But you can use this trick only ones, by cleverly scaling the weights and chosing an offset (from the redundancy bits). And the type of compression we need is highly non-linear:
Suppose we are looking at 3 occupncy bits on one ray. They can have 8 states, that result in 4 different attack sets. If I write the first square on the ray (nearest to the attacker) left, these would be:
000: 4 moves
001: 3 moves
010: 2 moves
011: 2 moves
100: 1 move
101: 1 move
110: 1 move
111: 1 move
In theory the number of moves can be represented by 2 bits, one bit less than the occupancy. We see that the rightmost occupancy bit only makes a difference once (3 or 4 moves). This can be achieved by giving it a weight that is smaller than the quantization step, and tuning the offset such that this small step just exceeds the first threshold, but doesn't make any significant contribution when any of the other bits are set (i.e. any combination of their weights always stay clear of other thresholds.)
But then you have used up all your freedom: even if 000 is just below a threshold, 100 and 010 must fall in different 'bins' as 001, so the weight of one of them would have to be at least one quantum plus what you need to push the offset over the threshold, and the other still one quantum more. But that means that 110 will add at least three quanta plus twice the 'offset push', i.e. 110 can never be in the same bin as 100. So you cannot do with a two-bit key for this.
You will run into this problem every time. To have the farther pieces make a difference if the closer piece is absent, (as they must), they will make an unwanted difference if the closer piece is there. Because the difference they make to the product is independent on the presence or absence of other pieces. You will never be able to compress the key length by even one full bit.
Although this holds per ray, and this ray could be compressed a fractional bit (e.g. 5 states in stead of 8), you might hope that the presence of multiple rays (two for the Rook corners) might at least give you a full bit. But you can use the offset+rounding trick only once, you don't want the rounding in one ray to affect the rounding of the other, because the rays are truly independent.
Only if you would do more rounding (clip off not only the bits on the right, but also clear some bits in the center of the key) you will create independent thresholds. But this would be kind of defeating the purpose, you would be creating a key with holes in it.
So I think minimal magics just don't exist. For Rooks I find magics for 12-bit keys very easily. For 11-bit keys I find them easily, except for the corner squares (which I don't find at all, even at thousand times more searching) For 10-bit keys, I find them easily except for the edges...