Compact Bitboard Attacks

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Re: Compact Bitboard Attacks

Postby Gerd Isenberg » 22 Aug 2006, 19:49

Zach Wegner wrote:Hello Gerd and everyone else,

Gerd Isenberg wrote:
Code: Select all
u64 rayBB = occupiedBB & rayMask[sq];
u32 fold  = (u32)(rayBB) | (u32)(rayBB>>32);
u32 occ64 = (fold * 0x02020202) >> (32-6);

...
Problem are the vertical files, where all masked occupied bits reside on the same file.


I was playing with these ideas, and I found, using a modified version of Walter Faxon's/Tony's magic finder, how to do a 32-bit folded lookup with one multiply:
Code: Select all
u64 rayBB = occupiedBB & fileMask[sq];
u32 fold  = (u32)(rayBB) | (u32)(rayBB>>29);
u32 occ64 = (fold * 0x01041041) >> (32-6);

All magics with the same lower 25 bits give the same result, and you can probably change with the 29. It's an interesting magic because each of the bits are 6 away from each other == the number of bits in the index.

Regards,
Zach



Hi Zach,

well, i have some problems with your 5-bit factor to map masked six inner occupied file bits to some occupied64 state 32-bit wise! I think that problem is solved so far, as demonstrated in the posted 64/32-bit source.

Some further explanation on the file-flipping, to make things clear:

Assume a1,b1,...,h8 == 0,1,...,,63 square-bitindex-mapping. For h1,a8 == 0,63 mapping, simply mirror the files a-h,b-g etc.. The six inner bits of the a-file, a2..a7 are masked, all other bits zero. 64-bit arithmetic first, to get the index in the six upper bits (before >> 58): We need to map a2->c8, a3->d8, .a7->h8, thus we need to shift left six bits by different amounts:

Code: Select all
a2->c8  8->58  <<50
a3->d8 16->59  <<43
a4->e8 24->60  <<36
a5->f8 32->61  <<29
a6->g8 40->62  <<22
a7->h8 48->63  <<15

Since our fileBB contains set bit only in an eight-bit distance, and all six shifts have a delta of seven, each shift containts disjoint sets of bits, ....
Code: Select all
  ( fileBB << 50 )
| ( fileBB << 43 )
| ( fileBB << 36 )
| ( fileBB << 29 )
| ( fileBB << 22 )
| ( fileBB << 15 )
... and we can replace | by + or xor ...
Code: Select all
  ( fileBB << 50 )
+ ( fileBB << 43 )
+ ( fileBB << 36 )
+ ( fileBB << 29 )
+ ( fileBB << 22 )
+ ( fileBB << 15 )

... and finally replace the whole shift,add expression by 64-bit multiplication with the h2-c7 antidiagonal:
Code: Select all
      fileBB * 0x0004081020408000
thus
Code: Select all
occ64 = (fileBB * 0x0004081020408000) >> (64-6);

That only works for the masked a-file so far, with the masked b-file we need to shift one less each. And so on with other files
Code: Select all
    A-fileBB * 0x0004081020408000
    B-fileBB * 0x0002040810204000
    C-fileBB * 0x0001020408102000
    D-fileBB * 0x0000810204081000
    E-fileBB * 0x0000408102040800
    F-fileBB * 0x0000204081020400
    G-fileBB * 0x0000102040810200
    H-fileBB * 0x0000081020408100

or
Code: Select all
maskedFileBB * (0x0004081020408000 >> file);

Instead of masking the file and to shift the factor right, i prefere to safe the filemask memory access but shifting right the occupied bitboard, to mask the a-File by constant:
Code: Select all
((occupied >> file) & 0x0101010101010101) * 0x0004081020408000

The proplem i had, to map the first rank attacks to the h-file was the bit-distance inside the factors. Here a 7-bit distance, antidiagonal factor resulted in overflows, since shifted rank attacks would overlap by one bit. Thus the need for the nine bit distance factor, the a1-h8 diagonal, 0x8040201008040201, and to consider the reversed result with the reversed occupied index factor 0x0080402010080400, the c2-h7 diagonal.

For 32-bit there are some options to fold both halfs to get disjoint multiplication as well. I prefere symmetrical 4-bit distance which also works for mapping all eight bits, see my fileAttacks 32-bit code. All my file-rank mapping factors are either diagonals or antidiagonals, while your 32-bit factor seems to have a strange square pattern. at least with my mapping a1,a4,c3,e2,g1?
Did i miss something?

Cheers,
Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Compact Bitboard Attacks

Postby Zach Wegner » 22 Aug 2006, 22:11

Gerd Isenberg wrote:Hi Zach,

well, i have some problems with your 5-bit factor to map masked six inner occupied file bits to some occupied64 state 32-bit wise! I think that problem is solved so far, as demonstrated in the posted 64/32-bit source.

Some further explanation on the file-flipping, to make things clear:

Assume a1,b1,...,h8 == 0,1,...,,63 square-bitindex-mapping. For h1,a8 == 0,63 mapping, simply mirror the files a-h,b-g etc.. The six inner bits of the a-file, a2..a7 are masked, all other bits zero. 64-bit arithmetic first, to get the index in the six upper bits (before >> 58): We need to map a2->c8, a3->d8, .a7->h8, thus we need to shift left six bits by different amounts:

Code: Select all
a2->c8  8->58  <<50
a3->d8 16->59  <<43
a4->e8 24->60  <<36
a5->f8 32->61  <<29
a6->g8 40->62  <<22
a7->h8 48->63  <<15

Since our fileBB contains set bit only in an eight-bit distance, and all six shifts have a delta of seven, each shift containts disjoint sets of bits, ....
Code: Select all
  ( fileBB << 50 )
| ( fileBB << 43 )
| ( fileBB << 36 )
| ( fileBB << 29 )
| ( fileBB << 22 )
| ( fileBB << 15 )
... and we can replace | by + or xor ...
Code: Select all
  ( fileBB << 50 )
+ ( fileBB << 43 )
+ ( fileBB << 36 )
+ ( fileBB << 29 )
+ ( fileBB << 22 )
+ ( fileBB << 15 )

... and finally replace the whole shift,add expression by 64-bit multiplication with the h2-c7 antidiagonal:
Code: Select all
      fileBB * 0x0004081020408000
thus
Code: Select all
occ64 = (fileBB * 0x0004081020408000) >> (64-6);

That only works for the masked a-file so far, with the masked b-file we need to shift one less each. And so on with other files
Code: Select all
    A-fileBB * 0x0004081020408000
    B-fileBB * 0x0002040810204000
    C-fileBB * 0x0001020408102000
    D-fileBB * 0x0000810204081000
    E-fileBB * 0x0000408102040800
    F-fileBB * 0x0000204081020400
    G-fileBB * 0x0000102040810200
    H-fileBB * 0x0000081020408100

or
Code: Select all
maskedFileBB * (0x0004081020408000 >> file);

Instead of masking the file and to shift the factor right, i prefere to safe the filemask memory access but shifting right the occupied bitboard, to mask the a-File by constant:
Code: Select all
((occupied >> file) & 0x0101010101010101) * 0x0004081020408000

The proplem i had, to map the first rank attacks to the h-file was the bit-distance inside the factors. Here a 7-bit distance, antidiagonal factor resulted in overflows, since shifted rank attacks would overlap by one bit. Thus the need for the nine bit distance factor, the a1-h8 diagonal, 0x8040201008040201, and to consider the reversed result with the reversed occupied index factor 0x0080402010080400, the c2-h7 diagonal.

For 32-bit there are some options to fold both halfs to get disjoint multiplication as well. I prefere symmetrical 4-bit distance which also works for mapping all eight bits, see my fileAttacks 32-bit code. All my file-rank mapping factors are either diagonals or antidiagonals, while your 32-bit factor seems to have a strange square pattern. at least with my mapping a1,a4,c3,e2,g1?
Did i miss something?

Cheers,
Gerd


It is quite strange, yes, but it is an out of order mapping. There are only 5 bits because each bit in the factor maps more than one bit. The trick here is the odd shift 29, so that the multiply does not overflow individual bits. I have since found that 25 and 27 will work with the same magic. I think a diagram would illustrate it better:
Code: Select all
occ                occ | occ >> 29    * 0x01041041 with the index bracketed
0 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0
a 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0    d 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0    e 0 d a 0 0 0 0
c 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0    f 0 e b d a 0 0
d 0 0 0 0 0 0 0    d 0 0 0 0 0 0 0    d a[f c e b d a]
e 0 0 0 0 0 0 0    e 0 0 a 0 0 0 0    e b 0 a f c e b
f 0 0 0 0 0 0 0    f 0 0 b 0 0 0 0    f c 0 b 0 0 f c
0 0 0 0 0 0 0 0    0 0 0 c 0 0 0 0    0 0 0 c 0 0 0 0

Hehe, I guess that is rather messy, but you can see the groups of 3 going up and left.

The interesting thing is that this works for any masked file. In fact if it was shifted to the a-file, you could get away with the 3-bit factor 0x00041040 (but using a shift of 23).

This could probably be applied to 64 bit as well, but we need a way to shift one group of 3 bits to an odd distance from the other 3, so you might as well just use the 32 bit code.

Regards,
Zach
User avatar
Zach Wegner
 
Posts: 182
Joined: 26 Sep 2004, 22:02
Location: Austin, Texas, USA

Re: Compact Bitboard Attacks

Postby Gerd Isenberg » 22 Aug 2006, 23:05

Zach Wegner wrote:
It is quite strange, yes, but it is an out of order mapping. There are only 5 bits because each bit in the factor maps more than one bit. The trick here is the odd shift 29, so that the multiply does not overflow individual bits. I have since found that 25 and 27 will work with the same magic. I think a diagram would illustrate it better:
Code: Select all
occ                occ | occ >> 29    * 0x01041041 with the index bracketed
0 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0
a 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0    d 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0    e 0 d a 0 0 0 0
c 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0    f 0 e b d a 0 0
d 0 0 0 0 0 0 0    d 0 0 0 0 0 0 0    d a[f c e b d a]
e 0 0 0 0 0 0 0    e 0 0 a 0 0 0 0    e b 0 a f c e b
f 0 0 0 0 0 0 0    f 0 0 b 0 0 0 0    f c 0 b 0 0 f c
0 0 0 0 0 0 0 0    0 0 0 c 0 0 0 0    0 0 0 c 0 0 0 0

Hehe, I guess that is rather messy, but you can see the groups of 3 going up and left.

The interesting thing is that this works for any masked file. In fact if it was shifted to the a-file, you could get away with the 3-bit factor 0x00041040 (but using a shift of 23).

This could probably be applied to 64 bit as well, but we need a way to shift one group of 3 bits to an odd distance from the other 3, so you might as well just use the 32 bit code.

Regards,
Zach


Ahh, i see - fine trick. Safes some shifts for the sake of an incompatible occupied64 state for files - but that doesn't matter anyway if you index pure file dependent tables.

Regards,
Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Compact Bitboard Attacks

Postby H.G.Muller » 06 Sep 2006, 15:12

OK, finally I got the background to understand what the heck you all have been talking about in this discussion... :wink: (Beware! This is not the same as understanding what you have all been saying! :shock: )

I am still muddling with the file attacks, (before thinking about diagonals), to see if I can improve the trade-off between compactness and CPU load:

If the from-square index f gives me an access to a 64-bit mask[f] and multiplier[f] that I have to apply to the bitboard to obtain a unique 'occupancy index' X for looking up the file attacks... Rather than doing the lookup in a two-dimensional table FileAttack[X][f] (or FileAttack[X][RANK(f)]<<FILE(f) to save some space), I would like to include the rank of the attacker into the index X, so that I can do FileAttack[X] and save me a hidden multiplication/shift for the 2-dimensional indexing.

Suppose I am working on the a-file, which has mask 0x0101010101010101. Although the occupancy of the edge squares is not important for blocking, they might contain the attacker, so I leave them in for the moment. I can always kill them later with the aid of the multiplier.

If my multiplier is b0b1b2b3b4b5b6b7 (bn being a group of 8 bits) the high-order byte of the product and the masked bitboard s7s6s5s4s3s2s1s0 (sn being a group of 8 bits containing 0 or 1 depending on occupancy) will be b0*s0+b1*s1+ ... +b7*s7. The sn will effectively turn the bn on or off, and as long as the bit-patterns for bn don't overlap (i.e. bn & bm == 0 for any n != m), the addition will never involve any carry propagation.

The idea now is that the multiplier knows which rank the attacker is on (since it was obtained as multiplier[f]). It could thus map all occupancy bits that are not the attacker or an edge square to the same 5 bits, sparing a hole of 3 bits to contain a unique attacker-rank ID. For an attacker at (say) s3, we would choose to construct the multiplier with b0=b7=0, b1=0x01, b2=0x02, b4=0x04, b5= 0x08, and b6=0x80. We could fill the 0x70-bits by setting b3 equal to the rank ID (e.g. b3 = 0x30). It is certain to be multiplied by a 1 (since the attacker is there, s3=1), and thus to be copied to the result. This would then produce a 256-bit index that is unique for any (occupancy,attacker-rank) combination.

A problem occurs when the attacker is on one of the edge squares. In that case all 6 central bits of the occupancy are significant. We really need a 9-bit index X. We could use the 9 most-significant bits of the product for this, in stead of 8 bits. If we map s6 to 0x80 in the MSB, the neighboring byte would have s5 there, (it is equal to b6*s5+ ... +b1*s0) and the high-end bits of the product would look like:

s6 R R R s5 s4 s3 s2 s5 0 0 0 s4 s3 .....

Bit 0 and bit 4 of this 9-bit key would thus normally be the same, (namely s5), no matter how we pack the occupancy into bits 0-4 & 8 of this key. For the edge squares, which we will assign RRR = 111 (1st rank) or RRR = 000 (8th rank) we would be one occupancy bit short, and we could solve this by mapping s1 to the same bit (bit 4) as s5 (since s5 occurred in duplicate): b1 = b5 = 0x08. As a side effect s0 overlays s4 in the byte below, but this has no consequence: even if it generates a carry the intervening zero bits prevent that from propagating to the high-most 9 bits. All situations where s1 is occupied then generate formerly unused codes, where bit 0 and bit 4 of the index are opposite. That a carry might propagate from bit 7 to bit 8 doesn't destroy the uniqueness of the mapping, it just shifts it.

So for attacker rank 7 (corresponding to s7) the situation is thus satisfactorily solved with b7=0 and b1=0x08. Due to b7=0 the rank-ID bits in all bytes of the product are always forced to zero. For attacker rank 0 (corresponding to s0) we have to generate the RRR=111 bits, which can be done by setting b0=0x70 and b1=0x08. The ID bits in all bytes of the product but the highest would still be zero, since b0 does not contribute to those bytes.

For other files than the a-file the mask would have to be shifted left by the rank number, and the multiplier shifted an equal number of bits to the right (all at table-building time), so that by the time we have the product the situation is the same for all the files. Note that this is always possible, since all multipliers have their LSB (b7) zero! So we would never lose any bits on shifting. (In other words, even for the h-file which is leftmost in the bitboard, bits would only have to contribute to bits left of them, which can be done by multiplication.)

We could use the unique 9-bit index X as a direct lookup for the file-attack bitboards, or for a 1-byte sequence number of the 70 occurring ones. Shifting the retrieved bitboard to the proper file (FileAttacks[X] << FILE(f) ) is probably always cheaper than tabulating the various files separately (FileAttacks[X][FILE(f)] ), the shift being cheaper than the index calculation. (Note there is no carry between the low and high 32-bit word of the bitboard during shifting.)
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Compact Bitboard Attacks

Postby H.G.Muller » 06 Sep 2006, 17:31

Perhaps there is a much simpler way to look at this. A multiplication is just a series of shifts and adds. The thing we are multiplying (the occupancy masked out of the bitboard) is sparse in 1 bits. For files there might be 8 occupancy bits, and they are spaced 8 bits apart (7 intervening zeros).

The trick is to stack a number of copies of this bit string, and slide them along each other such that for a group of 9 consecutive bits (the index length) we will have at most a single 1 in each coulumn. To this we add the requirement that one of the occupancy bits (the one that we know the attacker to be on) should be used to create a three bit number (the rank ID) which ends up in the same location of the index key always. So:
Code: Select all
000A0000000B0000000E0000000Z0000000M0000....000H0000  <--->
                              000A0000000B0000000E0000000Z0000000M0000....000H0000
                     000A0000000B0000000E0000000Z0000000M0000....000H0000
           0AAA00000BBB00000EEE00000ZZZ00000MMM0000....0HHH0000 (this one supplies ID)
000A0000000B0000000E0000000Z0000000M0000....000H0000
                                ^^^^^^^^^
                                876543210 extracted index bits
                                BA.MZZZ.E

Problem is that we need to derive a 9-bit key, and that the occupancy bits are only 8 apart. So when we align one of the lines to have its occupancy bit above the first or last bit of the key, in general it will contribute two bits.

OK, no sweat! If we can't avoid it, exploit it! One of the lines will have to supply the rank ID, we multiply that by the desired ID and align it such that this ID derived from duplicating the occupancy bit of the atacker appears in the desired place somewhere in the middle of the key. (Say bits 5-7, as before.)

Since there are 8 occupancy bits, only one of it designating the attacker, and the outmost being irrelevant, there will always be a group of 3 consecutive squares (along the file) that do have to occur in the key on one side or the other of it. Pick a neighboring pair from that, and align it such that it contributes the first and last bit to the key (bits 0 and 8).

This leaves bits 1-4 to be filled. Since these bits are in the middle of the key, they can be aligned with any central occupancy bit independently of anything else. So for any given atacker rank we can tailor the mapping such that 6 occupancy bits and a rank ID appear in the key. To avoid left-shifts by a negative amount (which we can't do with a multiply) we avoid using the most significant occupancy bit. If we use the 9 most-significant bits of the product as key, every other bit can be shifted into it by left-shifts of zero or more!) This was an edge bit anyway, so it would only be relevant if the attacker was there and the rank ID would have to be derived from it. So we assign this attacker rank ID=000. This left-most edge bit can thus always be masked off immediately.

Note that for most ranks the attacker is on we don't really have to accomodate 6 occupancy bits in the key. We can avoid an occupancy bit mapping to bit 8 of the key by aligning the left-most occupancy bit (after the masking) with bit 0 of the key (so that it has no left neighbor). Only when the attacker is on one of the edge squares bit 8 of the key has to be used (and might be 1). By using rank ID=000 and 001 for these two cases, and mapping the ID in key bit 5-7, the highest key value that can in theory occur is 0x13F: there are no holes in the mapping, and only 320 different key values can occur.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Compact Bitboard Attacks

Postby Gerd Isenberg » 06 Sep 2006, 21:36

Thanks for your interest, H.G. ;-)
During this thread i really got a new understanding in (binary) multiplication - somehow similar to my early school days where i learned basic arithmetics.

If i compare your file attack getter with mine, there is exactly the same idea to get the occupied state. Note that the rotate factor 0x8040201008040201 (a1h8-diagonal with my mapping) somehow reverse the arithmetical order if you map the a-file to the 8.rank. Therfor the h1-a8 anti-diagonal 0x0102040810204080 with a 7-bitindex distance is also appropriate to rotate the a-file to the 8.rank, with a8->h8, a7->g8, ..., a1->a8 mapping.

Code: Select all
b = AllPieces>>FileNr;      /* files are numbered 0-7 */
b &= 0x0101010101010101LL;  /* isolate the file */
b *= 0x8040201008040201LL;  /* pack bits        */
b >>= 56;                   /* bring to LSB     */
b = FileAttacks[RankNr][b]; /* lookup reachable squares */
b <<= FileNr;               /* Move it to the relevant file */

Code: Select all
// file attacks with two reversed rotations
// 1. for the occupied state to map a2..a7 to h8..c8
// 2. for the attacks from a1..h1 to h8..h1
u64 fileAttacks(u64 AllPieces, u32 sq) {
   FileNr = sq  &  7;
   occA   =   0x0101010101010101LL & (AllPieces >>  FileNr); // a-file
   occ64  = ( 0x0080402010080400LL *  occA ) >> 58;
   ratt   = ( 0x8040201008040201LL *  firstRankAttacks[occ64][(sq^56)>>3] ); // map 1.rank to h-file
   return   ( 0x8080808080808080LL &  ratt ) >> (FileNr ^ 7); // mask h-file and shift back   


I had the ambition to share the same 512-byte table by all four directions with bytewise attacksets on the first rank. The table is indexed by the 64 possible inner six-bit occupied states, and the square index 0..7 (FileNr), where the attacking sliding piece resides on that first rank. For file-attacks, the problem was to map the rank-attackset to the h-file. The bytewise attack sets requires a 9-bitindex distance, thus the rotate on the maindiagonal with the reversed result was required to avoid overflows with otherwise overlapping bytewise rank-pattern. The factor 0x0080402010080400LL (c2-h7 sub-diagonal) contains six bit which map a2->h8, a3->g8,..., a7->c8 - to get the "reversed" occupied state of the a-file. One has also to consider the reserved "square index", here rankNr^7 or (sq^56)>>3.

Since the routine may also be applied with "virtual" sliders, not (yet) member of the occupied set, we usually ignore the redundancy if the slider is member of the inner six bits - same with rotated. One may argue that two imuls are expensive, but they are direct path (4 K8 cycles) and other independent instructions (most likely the 90-degree rotated getter) may execute in parallel. I somehow prefere the second multiplication approach to a further indirection to larger tables. The trump is the shared 512 byte table, plus two 64-element arrays with diagonal or anti-diagonal masks, 1.5KByte in total.

I somehow like the approaches of Lasse and Pradu to get all rook and bishop attacks with one lookup as well, specially Pradu's 41KByte bishop attacks, but the 800K for rooks may still tend to pollute caches and even tlb for the 4KByte pages, depending on what you do elsewhere...

Btw. i wrote a kind of "didactical" mdi-calculator (windows msvc6+mfc) with all 64-bit types (double and 64-bit int, plus all-simd-types, vectors of 2*32, 4*16 and 8*8-bit ints, shorts and chars) and a binary representation of operands on a chess-board according to my square-bitindex mapping. A set bit is represented by a colored circle with the bitindex number written inside.

It is/was a former debug-tool to view clipboard copied/pasted decimal or hexadecimal bitboard-strings, or even fen-strings for occupied sets. All kind of unary and binary operators are supported, bitwise logical, shifts, rotate, arithmetical, even 64*64bit to get the upper 64-bit of the 128-bit result. Using that tool with all kind of rank-, file- and diagonal-pattern as operands was very instructive for a pseudo- or hobby-mathematician like me ;-)

Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Compact Bitboard Attacks

Postby H.G.Muller » 07 Sep 2006, 11:30

Sharing the tables seems indeed a more worthwile quest than minimizing the individual tables, what I attempted. There was not nearly as much to gain there as I had hoped for.

I don't think imul is an especially expensive instruction (except on P4...). It pipelines like everything else, its moderate drawbacks are that there is only a single execution unit that can handle it, and that it has a latency of a few cycles. (And I think it competes with shift instructions, that are handled by the same execution unit...) But that is not very different from a memory load, even an L1 hit has a latency of 3 clocks. And usually there is lots of cache traffic you have to compete with as well. So it is not obvious to me that a multiply would be more expensive than a table lookup.

Packing a file or diagonal into a byte to save table space, and on retrieval scattering it over a bitboard by a
bb = tab8[key]*CONST64 & MASK64;
might be a very competitive approach if it would reduce cache misses.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Compact Bitboard Attacks

Postby H.G.Muller » 08 Sep 2006, 15:30

Gerd Isenberg wrote:I somehow like the approaches of Lasse and Pradu to get all rook and bishop attacks with one lookup as well, specially Pradu's 41KByte bishop attacks, but the 800K for rooks may still tend to pollute caches and even tlb for the 4KByte pages, depending on what you do elsewhere...

I suppose that the 800K is from 4 corner squares needing 4K Attacks tables (12-bit key), 24 edge square 2K and the 36 non-edge squares 1K (for a total of 100K, x 8 bytes).

This can be compacted by an indirection: do not store the 100K bitboards (of which there are onlly 70**2 = 4900 different ones, fitting in 40KB), but a 2-byte index. Then you would have 200KB in indices, and ~40KB in bitboards, at the price of an extra indirection.

An alternative would be to access the Attacks table not purely by index (i.e. as Attacks[index[square][key]]) but make it directly aware of the square. (Different squares never share attack boards, if you do file and rank at once.) So: Attacks[index[square][key]][square]. Since there are at most 144 different attack boards for any square, the index table could be a single byte, saving 100KB, at the expense of driving up the requirement for the bitboards to 144 x 64 x 8 = 72KB. A savings of ~67KB total (off the 240KB), so perhaps hardly worth the extra shift on the index.

The reason I want to go in this direction, however, is not so much because of reducing index size from 2 bytes to one, but because I have the feeling that, once the indices no longer explicitly contain the information for which square they are, many squares would be able to share the same index table. A bit comparable to the file attacks, where all files for a given attacker rank basically have the same pattern of blockings, and the file dependency can be incorporated after determining those, by shifting the bitboard.

In the 2-dimensional case there is no such shifting, but there are mirror operations that map different squares on each other. Since the translation index -> Attack board is done through a table lookup, that should be no problem if each square has its own table anyway. So (d4,e4,d5,e5) are all equivalent (two rays of 4, two of 3). If the magics for these squares could be designed to map the bits along the rays-of-4 always to the same key bits (in the correct order), and the same for the rays-of-3, a single index table could be used to 'collapse' the key to an index (essentially by ignoring the occupancy bits of unreachable squares, and remap the remaining patterns to a nice contiguous range of numbers). This representation code of the blocking situation along each ray could than be used to picke the corresponding attack board from each of the 4 tables.

There are 10 such 'equivalence classes' of squares (well known from EGTB compression...): 1 corner, 3 edge and 6 non-edge. That would require 16K (entries) in the index tables, rather than 100K. So even with a 2-byte index space-requirement is already dominated by the bitboards themselves. So we might go for speed rather than space here: do the access to the attack boards Attacks[index[square][key]][square] as Attacks[index[square][key]+square], by premultiplying all the stored indexes by 64. This saves a multiplication/shift on the array-index calculation. (The multiplication by 8 needed for the element size is handled by the addressing mode).

Even with the square number as the lowest-stride index, it is possible to pack the attack boards quite dense. The naive way of storing would require 144*64 entries. For many squares the index would not run that high, however. For b2, for example, the index would only run up to 36 (stored as 64*36). All entries corresponding to higher index would not be used, but they are not contiguous, but scattered over memory with a stride of 64, at locations 9+n*64 (9 being the square number of b2). Nevertheless, this set of holes is suitable to store something else there: Say the attack boards for square d4 (= 27) that would normally have had index 140 (index could run upto 144 for d4) and would map to quite large address 140*64 + 27 the naive way. In stead of tabulating the 140*64 in the index table, we could tabulate 37*64 + (9-27), so that after addition of the square number 27 we would retrieve the attack board from 37*64 + 9, the unused hole. It should be possible to pack the attack boards in a range of 96*64 through this trick (48KB).

We can thus also pre-multiply the index by another factor 8, really making it represent the address offset (which runs upto 48K) without exceeding the range for a (2-byte) unsigned short int (64K). The translation of the AllPieces bitboard and atacker square f to a move board would then go as:

1) 64-bit load of the mask, indexed by f.
2) 64-bit AND
3) 64-bit load of the magic multiplier, indexed by f.
4) 64 bit integer multiply
5) right shift to isolate the key bits
6) 32-bit load of the start address p of the index table for f
7) 16-bit load indexed by p+2*key
8) 64-bit load of the attack board, indexed by key+8*f

The required space would be 32K for the index tables + 48K for the attack boards, plus a~1K for masks and magics. That is 81K in total. Still not completely L1 compatible, but a lot friendlier than 800K. But it depends on the possibility to construct magics that preserve the equivalence between symmetry classes.

This should not be too difficult if I bring to bear my final secret weapon: :idea: more convenient assignment of squares on the board to bits in the bitboard. :idea: The unimaginative a1=0, b1=1, ... is really only useful if we want to apply translations to the board. But in this approach we don't do that anymore! 90-degree rotations are the natural symmetry operation. We could count the squares in such a way that a single-bit shift most closely mimics a 90-deg rotation. That makes occupancy bit-patterns for symmetry-equivalent squares very similar, and should greatly enhance the probability for finding magics that map the occupancy bits in exactly the same way into the key. :mrgreen:
E.g.
Code: Select all
char Sqr2Bit[] = {
63,59,55,51,47,43,39,62,
36,35,31,27,23,19,34,58,
40,16,15,11, 7,14,30,54,
44,20, 4, 3, 2,10,26,50,
48,24, 8, 0, 1, 6,22,46,
52,28,12, 5, 9,13,18,42,
56,32,17,21,25,29,33,38,
60,37,41,45,49,53,57,61
};

BITBOARD Sqr[64]; /* to be filled as Sqr[i] = 1LL<<(Sqr2Bit[i]); */
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Compact Bitboard Attacks

Postby Gerd Isenberg » 08 Sep 2006, 16:43

Yes, very interesting, Who is the bitboard beginner here? ;-)

Pradu's source already addressed the second indirection with short indices - and in my extreme snoob test it was a bit slower than Pradu's MINIMIZE_MAGIC orignal approach, two unfriendly accesses versus one very unfriendly:

Code: Select all
pattern := occupied & rookmask [sq];
address := (pattern * rookMagig[sq]) >> rookShift[sq] // 9..12 bit address
sqBase  := rooksOffset[sq]  // was sq<<12 on Lasse's approach
attacks := rookAttacks[sqBase + address]

http://wbforum.vpittlik.org/viewtopic.php?t=5452
http://www.prism.gatech.edu/~gtg365v/Buzz/

Need some more time to understand your other ideas...
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Compact Bitboard Attacks

Postby H.G.Muller » 08 Sep 2006, 16:47

Gerd Isenberg wrote:Yes, very interesting, Who is the bitboard beginner here? ;-)

Yes, but it helps to be a multiplication expert... :mrgreen:
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Compact Bitboard Attacks

Postby Pradu » 08 Sep 2006, 17:08

Code: Select all
char Sqr2Bit[] = {
63,59,55,51,47,43,39,62,
36,35,31,27,23,19,34,58,
40,16,15,11, 7,14,30,54,
44,20, 4, 3, 2,10,26,50,
48,24, 8, 0, 1, 6,22,46,
52,28,12, 5, 9,13,18,42,
56,32,17,21,25,29,33,38,
60,37,41,45,49,53,57,61
};

BITBOARD Sqr[64]; /* to be filled as Sqr[i] = 1LL<<(Sqr2Bit[i]); */


Yes this is a good idea, have the mapping changed directly without any xors. Can you explain to me why this maping is optimal before I go fiddling around with it? More specifically, how does this help more collisions happen?

For the corner squares in rooks, we have to make 12 occupancy bits collide into the minimal 6 bits for the index....

You will notice all the magics Lasse and I have generated create a pseudo-perfect hash that utilizes some collisions (12 bits in occupancy, 12 bits in index). If we can figure out exactly how the carry works... we'll have this problem solved with some pretty small lookup tables.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: Compact Bitboard Attacks

Postby H.G.Muller » 08 Sep 2006, 18:45

I am not claiming that this is optimal, it was jsut the first pattern that sprung to mind for having a one-bit right shift approximate a +90-deg rotation. I should really study this case where an 11-bit key worked for a corner square, to see its raw mapping of the occupancy bits into the key region (before the addition produces carries). By numbering the squares around the other corners such that their occupancy bits basically reproduce the same pattern (but shifted in the bitboard), you would just use the counter-shifted magic to exactly reproduce the result. (And you would have an 2K table that could be shared by all 4 corners!)

My naive idea about constructing the magics (i.e. the way I would do it by hand) is to make sure that the occupancy bits are scattered over the bitboard spaced as far apart as possible, so that the multiplication can independently shift them above the key location, without bringing neighboring bits into few. If the purpose is just collecting the bits (because the key is as long as there are occupancy bits) that is the most straightforward way to do it. In the scheme above the squares along an edge would be mapped four apart, but a corner position would cover two edges, and that would mean neighboring bits.

So it might be better to represent the 90-degree rotations by larger shifts in the bitboard, perhaps 16-bit shifts. Like this:
Code: Select all
63,62,61,60,59,58,57,47,
 9,56,55,54,53,52,40,46,
10, 4,51,50,49,35,39,45,
11, 5, 1,48,32,34,38,44,
12, 6, 2, 0,16,33,37,43,
13, 7, 3,17,18,19,36,42,
14, 8,20,21,22,23,24,41,
15,25,26,27,28,29,30,31

For corner positions this would bring the occupancy bits nicely in 2 contigous groups of 6 (9-4, 25-30, 41-46 or 57-63), each with at least 10 zeros on either side. So it would be trivial to collect them in a 12-bit key by a magic that contains just two one bits. The mapping would perfectly reflect the symmetry relation, 6 key bits would represent the occupance of a ray traversed left to right (clock-wise on the board), the other right to left (counter-clockwise on the board). They would thus be able to share the index table.

For a Rook on b1 (bit 62) you would have 61-56 and 4-8 as occupancy. For the Rook on a7 you would similarly have 13-8, 20-24. Both have a group of 6 and a group of 5, traversed in the same direction, easy to combine to an 11-bit key that could be shared between b1, h2, g8 and a7. Sharing with the mirror image g1 does not seem to be possible, since in one the group of 6 is traversed ascending direction, and in the other in descending. I don't know a magic that can swap the bit order of contigous bits. But perhaps sharing in groups of 4 is good enough, although for the edge squares, which have a longer key, you would like to have all 8 share an index table.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Compact Bitboard Attacks

Postby Pradu » 08 Sep 2006, 19:26

I've been thinking a little bit about the problem of big tables. The main problem is having to get something that isn't in the cache. So possilby you can have "better" magic keys that make more collisions happen and have the ones that can't be collided be sparsely populating the database, this way, we can care less about how big the actual table is and only care about how many collisions happen. Mabe it is possible to have 12 bit keys with huge numbers collisions happening and working as efficiently as a minimal index hashing key.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: Compact Bitboard Attacks

Postby H.G.Muller » 09 Sep 2006, 19:11

OK, this is as good as it gets:

At first glance it would seem convenient to assign all squares on an edge to a group of contiguous bits, since they can be brought all at once in the key by a single bit in the magic (either the whole edge appears, or it doesn't appear at all). This would make it possible to sparse out the remaining bits more. We have to be careful, however: to be able to share index tables, it is not only a matter of bringing the bits in the key, but you want corresponding squares of the occupancy set to end up in the same bit of the key. So an edge traversed in the opposite direction would have to need its bits sorted into opposite order, and this is impossible to do for a contiguous set.

To solve this problem, the edges are split in two half-edges of three squares. Each half-edge is assigned a group of 3 contiguous bits, and the two half-edges have their squares numbered in opposite direction (away from the corner). The msb of each group then corresponds to squares that are each image on (horizontal or vertical) reflection. So to get an equivalent representation of an edge traversed in the opposite direction, we only have to swap these blocks. By assigning bits that are sufficiently far apart, the two half-edges can be brought independently to any position in the key, so the swapping is easy.

Groups of 3 bits thus play an important role, and we will divide up the bitboard in 20 groups of 3 bits, plus 4 bits for the corners. If we number the edges 1-4, clockwise, we can assign the groups as:
Code: Select all
board (each letter marks a 3x3 quadrant of inner squares):

   1  1'         1 1'
   ----          ----
4'|A  a| 2    4'|   B| 2
4 |   o| 2'   4 |p  b| 2'
   ----          ----
   3' 3          3' 3

   1  1'         1 1'
   ----          ----
4'|s   | 2    4'|d  t| 2
4 |c  C| 2'   4 |D   | 2'
   ----          ----
   3' 3          3' 3

bitboard (each letter represents 3 bits):
msb  -4dDt3cCs2bBp1aAo4'3'2'1'   lsb

The numbers thus represent the (half-)edges, the letters the inner squares. The corners get assigned to the most-significant 4 bits, since they never occur in any occupancy set, and it is favorable to have all used bits as much to the right as possible (since a multiplication can only shift them to the left).

If the attacker is in a corner, the occupancy set consists entirely of edges, two of them (say 1 and 2). These can be easily moved to the uppermost 12 bits (the key), by a magic with only 3 non-zero bits:
Code: Select all
-. . ......2...1.....2'1' masked
2. . .1.....2'1'          <<28
.. . 1.....2'1'           <<31
.2'1'                     <<55
______________________________ +
22'1'1                    key

- 4.. .........1....4'..1' masked
. 4.. .........1....4'..1' <<1
. .1. ...4'..1'           <<33
4'..1'                    <<52
______________________________ +
4'411'                    key

Note that the exceptional case 1+4, where the primed half-edges did not have bordering bit assignments, does not have the outcome 11'4'4 expected from cyclic permutation. This would require shifting group 4 to the right, and is thus not possible through multiplication. Fortunately the situation is symmetric w.r.t. diagonal mirroring, which maps 4 <-> 1' etc. So all four corners can be mapped to fully equivalent 12-bit keys, which could share the index table.

When the attacker is on an edge square, the occupancy set consists of a single edge (with one square, the one the attacker is on, missing, plus a number of inner squares added). Because the assignment of square numbers is such that a +90-degree rotation simply maps the 1aAo1' bitgroups on the corresponding 2bBp2', we can make equivalent keys for the rotated states simply by shifting 2bBp to where 1aAo was, etc. The challenge, however, is to also find equivalent keys for the mirror-image positions (b1->g1, etc.). For a Rook on edge 2 this will definitely involve swapping 2 and 2'. The other edges will not appear in the occupancy set, but every other bit of the inner-square groups might appear, depending on where exactly the Rook is located on this edge. The two mappings could be:
Code: Select all
case(1):
-..dDt.cCs2bBp.aAo..2'. masked (letter groups sparse!)
2bBp.aAo..2'.           <<30
xxxxxxx                 other shifts to collect inner squares
.2'.                    <<54
______________________________ +
22'Bp                   key

case(2):
-..dDt.cCs2bBp.aAo..2'. masked (letter groups sparse!)
.2bBp.aAo..2'.          <<27
xxxxxxx                 collect other inner squares, in particular:
bBp.aAo..2'.            <<33 collect B
2'.                     <<57
______________________________ +
2'2bB                   key

Note, however, that in case 1 the group b collides with 2'. Bits in b can only be hit, though, if the Rook is in 2'. If the Rook is on 2, its non-edge ray only runs through the quadrant where it can hit B (and a, A, s, d, t), but not b or p, for which it has to be on 2' (where it can't hit B). And if the Rook is on 2', the square of the half-edge 2' it is on is not in the occupancy set, there is a 1-bit hole in 2' after masking. So we can arrange it such that the 3 squares mapping to bits in b in the 3x3 quadrant next to 2' precisely fill that hole. (i.e. put one such square on each ray perpendicular to the 2 edge). 2' and b thus don't really collide, they rather merge perfectly to a combined 3-bit group (2 bits from 2', one from b).

B has to contain the mirror-image squares of b, as B is to 2 what b is to 2'. So in case 2, when the Rook is on 2 and hence 2' is complete, b is clear (so still no collision between 2' and b). In this case we have to merge 2 with B, which we do by intentially shifting B above the appropriate key bits. Note that b and p, which then also map into the key, are clear in this situation. So making use of the knowledge that we use case (1) only if the Rook is on the 2' half-edge, and (2) if it is on 2, the cases look more like
Code: Select all
case(1):
-..dDt.cCs2b.p.aAo..2'. masked (Attacker on 2')
2b.p.aAo..2'.           <<30
xxxxxxx                 other shifts to collect inner squares
.2'.                    <<54
______________________________ +
22'.p                   key

case(2):
-..dDt.cCs2.B..aAo..2'. masked (Attacker on 2)
.2.B..aAo..2'.          <<27
xxxxxxx                 collect other inner squares, in particular:
.B..aAo..2'.            <<33 collect B
2'.                     <<57
______________________________ +
2'2.B                   key

Note that the extreme case of the Rook on edge 4 does not represent any special problems: In case (1), which is used if the Rook is on 4', the group 4' borders on o, rather than on another edge group. But the rays of a Rook on on 4' cannot intersect o. In case (2) 4' has to be shifted to the left of 4, (which itself cannot be right shifted!), but there are 4 empty (corner) bits to accomodate it. So for each pair of mirror-image attacker squares, we can create an equivalent key that has the 6 edge occupancies in the high-order 6 bits, the attacker itself being in the second half-edge, the hole created there filled by the occupancy of an equivalent square in the quadrant next to it. As far as the mapping of these squares into the key is concerned they could thus share an index table between all 8 symmetry-equivalent edge squares. If it can be done in practice also depends on the mappings of the 5 remaining inner squares, which varies case by case (i.e. b1 is different from c1, d1).

An example of an assignment (which, due to the way I assigned the bB bits to squares I refer to as the W system) could be:
Code: Select all
         1        1'
   63,23,22,21, 0, 1, 2,60,
   11,38,16,56,29,19,50,35,
4' 10,55,37,15,18,49,28,34, 2
    9,17,54,36,48,27,20,33,
   57,44,51,24,12,30,41, 3,
4  58,52,25,42,39,13,31, 4, 2'
   59,26,43,53,32,40,14, 5,
   62, 8, 7, 6,45,46,47,61
         3'       3

       1     1'
   - 1 1 1 1'1'1'-
   4's A d B a t 2
4' 4'd s A a t B 2  2
   4'A d s t B a 2
   4 c D p o b C 2'
4  4 D p c C o b 2' 2'
   4 p c D b C o 2'
   - 3'3'3'3 3 3 -
       3'    3

The 3-bit groups are laid out such that there is never more than one square of them on a row or file, although a Rook could of course attack two of them from different directions if it is on an inner square. The occupancy set of a Rook on an inner square contains 10 inner squares out of 12 groups, so less than one per group on the average. In general, the inner-square bits are quite well spread out over the 45 high bits of the word if the Rook is on an inner square, and the unprimed half-edges cause an extra separation distance between the groups that map onto each other on rotation. The outermost groups are flanked by zeroes, so shifts could easily mimic periodic boundary conditions. This should make it very easy to find at least keys that allow sharing of index tables between the four squares related by rotation: the different keys should be cyclic permutations of each other (with width 48 bits). For the mirror images you will depend on clever keys. (The keys that do just one rotation group would be so simple that you could probably make them by hand...)

If you want to apply redundancy, the XOR patterns should similarly be rotated versions of each other, in their low-order 12 bits and the 48 bits above it, in steps of 3 and 12, respectively. In that case you will preserve the rotation symmetry.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Compact Bitboard Attacks

Postby H.G.Muller » 10 Sep 2006, 11:14

I was wrong! IT COULD GET SIGNIFICANTLY BETTER THAN THIS! Alwas amazing how you have to relentlessly struggle for days before you acquire insight, and once you have the insight the problem you have been struggling with suddenly seems completely trivial... 8-)

The point is that the symmetry operations of the chess board form a group, but it is not a cyclic group. So there is no way to naturally cycle through all symmetry-related squares, you have to throw in an irregular action at some point (e.g. a reflection between the 90-deg rotations). The problem is that a single occupancy set can contain more than one square of a symmetry class (e.g. c2 and f2), for which you would need the irregularity at a different time. So however you number the squares (c2, f2, g3, g6, f7, c7, b6, b3), cycling through them in that order will never transform the entire set undistorted.

The solution is to separate reflections from rotations: divide the squares into two groups, squares that you enumerate clockwise, and the other, (the mirror-images), which you enumerate counter-clockwise. A reflection is than simply swapping the sets. I already had intuitively realized that this separation pays to for edge squares, but it should be applied to all 6 'triangle' squares (b1-d1-d3).

By mapping this triangle and its rotation images in one area of the bitboard, (i.e. to a contiguous roup of bits there), and the mirror image of it in another area, a rotation of the board corresponds to a (cyclic) shift right in one area, and a (cyclic) shift left in the other. And a reflection is swapping the areas, i.e. shifts over a larger range. These shifts are very easy to undo by juggling the 1 bits in the magic multipler. The only thing you have to worry about is edge effects (where the two bitboard areas border).

Due to the cyclic nature of the shifts, the best strategy is to assign the entire triangle to 6 contiguous bits, and cycle that through its rotation images by shifts of 6. Then there is only one point in the area where you have to 'wrap around'. (If you would go for shifts of 1, the wrap-around would occur locally for every nybble, which is not easy to implement by a multiplication.)

The diagnals you could arbitrarily include into one of the groups, but it probably works better to map them to a separate area: otherwise the multiplication would not easily be able to swap the areas in order to map mirror images to the same key.

The previous idea of the groups of 3 can be kept (it remains useful to distinguish edge and inner squares, due to their very clear presence pattern that coorelates with key length). Note that in the previous assignment the groups designated o,p,s,t were diagonal groups, 1,2,3,4 and A,B,C,D edge and triangle groups, with 1',2',3',4' and a,b,c,d their respective mirror images (which we now map to the counter-rotating part of the bitboard).

The new mapping thus should be:

-4d3c2b1aopst3'C4'D1'A2'B

So what's different is that I ordered the primed section oppositely, and squeeze in the ABCD.

We can then play the same tricks: 2b and 2'B can be shifted above each other either as
Code: Select all
. 2b.         2 b..
2'B.          .2'B.

and the trick with the hole still works. The trick is now to neutralize the edge effects for the Rook on edge 1 or 3, when either 1a or 3'C might b perturbed (in a way different from the corresponding sections not at the edge) by the diagonal occupants opst. (Note that a Rook on the edge only has two diagonal squares in its occupancy set), so the problem with collisions in this region might not be very big.)

I still have to make a final assignment. One inconvenient boundary condition is still that right-shifts are not possible throuh multiplication, so yu have to be very careful what to assign in the high 12 bits, because that can never occur in the key at a lower bitnumber. This might interfere with you desire to implement cyclic shifting of the bitboard areas.

The good news for 32-bitters is that I realize now that they can ignore that problem! What you would really like is to have a 128-bit data-type (or at least 60-bit + key length), so that by taking the high-order bits of that you could shift anything anywhere. On 32-it machines such a long multiplication can be just as easily implemented as the standard 64-bit one:

Normally we would do CONST_b0_b31 * BB_b32_b63 + CONST_b32_b63 * BB_b0_b31 to obtain bits 32-63 of the product. You don't need the product of the two low words, since the only purpose of the bitboard into the key region. But the imul instructions that do the 32 x 32 multiply also deliver the high word of that product, i.e. bits 64-95. :idea: By mapping the unused corner squares to the lowest bit of the BB, we can shift any other bit of the BB to any place in the high-order word of the 32-bit product by multiplication with a single 32-bit constant. We even create the freedom to combine the two independently multiplied halves of the BB not by adding, but by OR or XOR, if thar would help, and we create extra edges in the BB over which bits cannot spill, and which could be conveniently used to isolate the rotating and counter-rotating areas. 8-)
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Compact Bitboard Attacks

Postby Gerd Isenberg » 11 Sep 2006, 22:14

The 9.5KByte compromize would look like this, similar to your initial approach and almost equal to the 1.5K approach but saving the second fillup multiplication for the diagonals and the second rotate mul,and for the files. Not sure you can beat that with your freaky square mapping - while this stuff is really fascinating, i have a hard time to understand your math.

Code: Select all
u64 diagonalMask[64];
u64 antidiagMask[64];
u64 fillUpAttacks[64][8];
u64 aFileAttacks[64][8];
u8  firstRankAttacks[64][8];

u64 rankAttacks(u64 occ, u32 sq) {
   u32 f = sq &  7;
   u32 r = sq & ~7; // rank * 8
   u32 o = (u32)(occ >> (r+1)) & 63;
   return (u64) firstRankAttacks[o][f] << r;
}

u64 fileAttacks(u64 occ, u32 sq) {
   u32 f = sq & 7;
   occ   =   0x0101010101010101 & (occ   >> f); // a-file
   u32 o = ( 0x0080402010080400 *  occ ) >> 58;
   return  ( aFileAttacks[o][sq>>3]    ) << f;   
}

u64 diagonalAttacks(u64 occ, u32 sq) {
   u32 f = sq & 7;
   occ   = ( diagonalMask[sq]   &  occ );
   u32 o = ( 0x0202020202020202 *  occ ) >> 58;
   return  ( diagonalMask[sq]   &  fillUpAttacks[o][f] );
}

u64 antiDiagAttacks(u64 occ, u32 sq) {
   u32 f = sq & 7;
   occ   = ( antidiagMask[sq]   &  occ );
   u32 o = ( 0x0202020202020202 *  occ ) >> 58;
   return  ( antidiagMask[sq]   &  fillUpAttacks[o][f] );
}
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Compact Bitboard Attacks

Postby Andreas Guettinger » 12 Sep 2006, 08:56

Hi Gerd,

I'm trying to figure out how to fill aFileAttacks and fillupAttacks at startup. It seems like that

Code: Select all
for (x=0; x<64; x++) {
    fillupAttacks[x][file] = 0x0101010101010101ULL * firstRankAttacks[x][file];
}


Is there a way to calculate aFileAttacks from firstRankAttacks or do I have to generate this the way like the rotated bitboard arrays?
I'm not up to date with this multiplication stuff.
regards,
Andy
Andreas Guettinger
 
Posts: 20
Joined: 26 Aug 2006, 23:37
Location: Bern, Switzerland

Re: Compact Bitboard Attacks

Postby Michael Sherwin » 12 Sep 2006, 13:42

This is for the people with an IQ of 140 or less. IQ 141
to 199 may choose either. IQ 200 and above choose that
which has been developed prior to this post.


Code: Select all
blockers.u = bishopBits[sq] & allPieces; // no edge squares
index = squareHash[sq]
      ^ hashRow2[blockers.row2]
      ^ hashRow3[blockers.row3]
      ^ hashRow4[blockers.row4]
      ^ hashRow5[blockers.row5]
      ^ hashRow6[blockers.row6]
      ^ hashRow7[blockers.row7];
moveBits = hashTbl[index] ^ friendlies;


I do not know how big the hashTbl would have to be.
It would be generated and stored to a file that
would be loaded at start up. Not only would the hash
table hold the move bits it would also hold the mobility.

Mike
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: Compact Bitboard Attacks

Postby Gerd Isenberg » 12 Sep 2006, 19:06

Andreas Guettinger wrote:Hi Gerd,

I'm trying to figure out how to fill aFileAttacks and fillupAttacks at startup. It seems like that

Code: Select all
for (x=0; x<64; x++) {
    fillupAttacks[x][file] = 0x0101010101010101ULL * firstRankAttacks[x][file];
}


Is there a way to calculate aFileAttacks from firstRankAttacks or do I have to generate this the way like the rotated bitboard arrays?
I'm not up to date with this multiplication stuff.


Hi Andy,

yes, the fillupAttacks is already correct. For the files it should be that way:
Code: Select all
aFileAttacks[x][file^7] = ((0x8040201008040201ULL * firstRankAttacks[x][file]) & 0x8080808080808080ULL) >> 7;


Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Compact Bitboard Attacks

Postby Andreas Guettinger » 12 Sep 2006, 19:18

Gerd Isenberg wrote:
Andreas Guettinger wrote:Hi Gerd,

I'm trying to figure out how to fill aFileAttacks and fillupAttacks at startup. It seems like that

Code: Select all
for (x=0; x<64; x++) {
    fillupAttacks[x][file] = 0x0101010101010101ULL * firstRankAttacks[x][file];
}


Is there a way to calculate aFileAttacks from firstRankAttacks or do I have to generate this the way like the rotated bitboard arrays?
I'm not up to date with this multiplication stuff.


Hi Andy,

yes, the fillupAttacks is already correct. For the files i should be that way:
Code: Select all
aFileAttacks[x][file^7] = ((0x8040201008040201ULL * firstRankAttacks[x][file]) & 0x8080808080808080ULL) >> 7;


Gerd


Many thanks for the help, Gerd. I will try this approach too for my move generator, and compare it to the 1.5kb and rotated version on my ppc64. I specially like the clean and simple implementation of this bitboard versions, so I will probably drop the rotated bitboards.
regards,
Andy
Andreas Guettinger
 
Posts: 20
Joined: 26 Aug 2006, 23:37
Location: Bern, Switzerland

PreviousNext

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 3 guests

cron