To not make the demands on chip real-estate too heavy, I might even settle for an instruction that does the shuffle of the bits in a square-independent way. Using a normal rotate instruction (ror) to position the occupied square into the most-significant bit without loss of bits, and then use the new instruction to collect the 8 x 7 bits that could be on the 8 rays (if they had not wrapped around the board edge) in the proper order into the 8 bytes of the word.
This would just require 2 new instructions with a fixed shuffled pattern in its single (64-bit) register operand: ray-to-board (r2b) and its inverse board-to-ray (b2r).
You could then generate e.g. QueenAttacks by rotating the Occupied bitboard right by the Square number, applying a board-to-ray instruction, masking with a square-dependent mask (from a 64-entry table) to kill any bits that went over the edge (using an OR to create edge stops in the 8th (most-significant) bit of each byte), subtracting 0x01010101...., XORing the result with the unsubtracted value, and use ray-to-board and the opposite rotate to create the Attacks bitboard. So 2 rotates, the 2 new instructions, an OR, a SUB and an XOR, 7 instructions in total.
The exact mapping of the new instructions would be:
- Code: Select all
brd ray
1 0 Rank to the left
2 1
3 2
4 3
5 4
6 5
7 6
63 8 Rank to the right
62 9
61 10
60 11
59 12
58 13
57 14
8 16 File forward
16 17
24 18
32 19
40 20
48 21
56 22
56 24 File backward
48 25
40 26
32 27
24 28
16 29
8 30
9 32 Diagonal left forward
18 33
27 34
36 35
45 36
54 37
63 38
55 40 Diagonal right backward
46 41
37 42
28 43
19 44
10 45
1 46
7 48 Diagonal right forward
14 49
21 50
28 51
35 52
42 53
49 54
57 56 Diagonal left backward
50 57
43 58
36 59
29 60
22 61
15 62
Note that the mapping is not 1-to-1, and some bits might have two destinations (in board-to-ray) or two sources (in ray-to-board). With two sources the result bit would simply be the OR of the two source bits. This would not create any ambiguity, since the Attacks, once generated in ray form, would not contain any over-the-edge bits.
That a certain square (e.g. 7) seems to be on 2 rays to board-to-ray is no problem: one of the two paths reaching it will always cross the board edge, and will be masked off (to a one). Which one of the two that is depends on the location of the square, so the mask can know that.