

Generating moves, starting from the piece that makes it, is simple enough for me, even in 0x88/mailbox representations. What I'm constantly struggling with is a fast way to generate captures. And not just generate them in bulk, but generate them in specific sub-groups. Basically, I want a move generator that simulates an attack table.
So what I would like is not an algorithm that looks at the occupation of a ray and tells me where I can go on that ray, but one that looks at the occupation of a ray and tells me what can hit me from it. In particular Slider moves, since Pawn attacks are easy enough to detect from the board (can only come from 2 squares), and Leapers attacks from the piece list. (There are only 3 Leapers, and the0x88 test immediately tells you if they can capture.) It is the Slider moves, with all the ray scanning, that are a pain. Also in differentially updated attack tables: for every move you do, you have to wonder which Sliders were passing over the to-square, and which are now passing over the from-square.
To know which Sliders attack me along a given ray, I need to distinguish not 3, but 4 possible states of each square: empty, obstacle, a slider of mine that moves along this ray, and a similar slider of the opponent. All 8 squares along a ray are important, since an attack can come from an edge square. So the edge squares need at least 3 states: {my slider, his slider, other}. There might not be enough savings there to treat it special.
So my idea is to keep a 16-bit int for each rank (8), file (8) or diagonal (2 x 15). There would be a table RayNr[64][4] that points you to the ray passing through a given square in a given direction. The 'board representation' itself would only need 46 shorts, so they could be pointed to by a 1-byte RayNr.
For each move you would have to update the 4 rays going through the from-square, and the 4 rays going through the to-square. (A bit similar to rotated bitboards.) To do this efficiently you could keep an array short int Masks[64][4] that has the two bits set that correspond to the particular square within the ray representation.
Once you have the board representation, you can quickly get the Slider attacks to a particular square SQ by taking each of the four rays r=Ray[RayNr[SQ][i]], i=0..3, and use them as an index in a lookup table together with the position within the ray (which could come from a table RayPos[64][4], that tabulated file and rank numbers, but in practice is probably derived more quickly from masking/shifting the square number itself). This table would then tell you what was going on along this ray: from which squares does an attack come, is it a doubled attack, etc.
The latter table would have a substantial size: 64K x 8 = 512K. (Times the element size!) It could be shared by all ray types, though (othogonals and diagonals). Short diagonals would simply pad the non-existent squares with empties or obstructions, so that they would use the correct subset of the table.
In principle you would also need two different tables for opposite colors, but you can flip the color content of a ray representation easily enough by clever encoding: make the upper byte contain one bit per square that is 0 for empty or obstruction, and 1 for an active Slider. The lower byte would in the first case make the distiction between empty and obstruction, in the second case between white or black. The color is then flipped by x ^= x>>8;
The square that you land on is not important for what can hit you (or what you can hit, for that matter, the table could also contain your own moves from that square, in addition to the captures to it!) . You could clear the corresponding 2 bits, reducing the table size by a factor 4. For all but the upper bits that would not give a contiguous range, though, but I think you can solve that by using (x ^ (x>>1)) & 0x3FFF as an index. (Haven't thought about that very deeply, so I might be wrong there...)
With 16K x 8 = 128K elements the table size seems manageable, and you could just about tabulate anything you wanted (bitboards with attacker positions, but also lists of square numbers where the attackers are on. (Unlike your own moves, which could total 7, there can only be upto two captures, in 30 different layouts, so a 5-bit value to be used as an index to look up the actual squares would suffice.) The amount of work you would need to get at it would be modest. And no 64-bit stuff, everything can be done in 16-bit arithmetic.
