Winboard Forum

by **Lasse Hansen** » 02 Aug 2006, 16:16

Hi again!

Since my old approach with bitboard move generator used pretty much memory (more than 2Mbyte), with possible cache pollution as a result, I have now made another one that uses ~10kbytes and should pollute the cache much less.

Code: Select all: TBitBoard MaskFile[8]; 64 TBitBoard MaskRank[8]; 64 TBitBoard MaskH8A1[15]; 120 TBitBoard MaskH1A8[15]; 120 U8 DiagH8A1Shift[15]; 15 U8 DiagH1A8Shift_R[15]; 15 U8 DiagH1A8Shift_L[15]; 15 U8 RookAttacksFile[8][64]; 512 U64 HashKeyRank[8]; 64 U64 HashKeyH8A1[8]; 64 U64 HashKeyH1A8[8]; 64 TBitBoard RookAttacksRank[8][32]; 2048 TBitBoard BishopAttacksH8A1[8][32]; 2048 TBitBoard BishopAttacksH1A8[8][32]; 2048 U8 RayH8A1[64]; 64 U8 RayH1A8[64]; 64 U8 RayH8A1Pos[64]; 64 U8 RayH1A8Pos[64]; 64 TBitBoard KingAttacks[64]; 512 TBitBoard KnightAttacks[64]; 512 TBitBoard WPawnAttacks[64]; 512 TBitBoard BPawnAttacks[64]; 512 Total: 9565 bytes

As one can see, it is still a non-rotated approach using hash-keys. Here are some of my results with perft and quasi-legal moves:

Code: Select all: pos 0:"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w QKqk -" pos 1:"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w QKqk -" pos 2:"q3k3/6q1/4q3/2q5/5Q2/3Q4/1Q6/3K3Q w - -" New Old Diff pos 0 d7: 72.3MNps 90.1MNps 24.6% N=3347724941 pos 1 d6: 74.8MNps 100.8MNps 34.8% N=8427118359 Pos 2 d5: 124.7MNps 137.3MNps 10.1% N=2000148357

So, its still pretty fast, but legality check is more expensive. Here is the code for Queens:

Code: Select all: Pcs=WQueens; while (Pcs) { From=FirstOne2(C=Pcs&-Pcs); File = From>>3; Rank = From&7; // File Attacks Pattern = ((AllWPieces|AllBPieces)&MaskFile[File])>>((File<<3)+1); A = (U64)RookAttacksFile[Rank][Pattern]<<(File<<3); // Rank Attacks Pattern = ((AllWPieces | AllBPieces) & MaskRank[Rank])>>Rank; Address = (Pattern*HashKeyRank[File])>>(64-RAT_RANK_BITS); A |= RookAttacksRank[File][Address]<<Rank; // H8A1 Attacks Diag = RayH8A1[From]; RayPos = RayH8A1Pos[From]; Pattern=((AllWPieces|AllBPieces)&MaskH8A1[Diag]&~EDGES) >>DiagH8A1Shift[Diag]; Address=(Pattern*HashKeyH8A1[RayPos])>>(64-BAT_H8A1_BITS); A |= (BishopAttacksH8A1[RayPos][Address]<<DiagH8A1Shift[Diag]) &MaskH8A1[Diag]; // H1A8 Attacks Diag = RayH1A8[From]; RayPos = RayH1A8Pos[From]; Pattern=(((AllWPieces|AllBPieces)&MaskH1A8[Diag]&~EDGES) >>DiagH1A8Shift_R[Diag])<<DiagH1A8Shift_L[Diag]; Address=(Pattern*HashKeyH1A8[RayPos])>>(64-BAT_H1A8_BITS); Pattern = A |= ((BishopAttacksH1A8[RayPos][Address]<<DiagH1A8Shift_R[Diag]) >>DiagH1A8Shift_L[Diag]) & MaskH1A8[Diag]; A &= AllBPieces; if (A & Kings) return -1; From = (From<<6) + (WQueen<<12); while (A) { To=FirstOne2(B=A&-A); *M++=From+To+(Board[To]<<16); A ^= B; } A=Pattern; A ^= A & (AllWPieces|AllBPieces); while (A) { To=FirstOne2(B=A&-A); *M++=From+To; A ^= B; } Pcs ^= C; }

For now I do both captures and non-captures in the same function, saving some overhead. They can easily be put in separate move lists.

I havent done mobility stuff yet, but this would be slower than crafty (except for files), because it would need hashing as well, and some 3-6k more mem usage.

Cheers
Lasse

by **Pradu** » 02 Aug 2006, 18:54

I guess this method without rotated bitboards will make makemoves faster but I guess faster generation of moves is more important because it will be used a lot in eval. I believe your old move generator is a lot better in this respect. All we need is just a better set of magic keys which could easily bring down the size of the databases.

by **Gerd Isenberg** » 03 Aug 2006, 18:45

Andrew Fan wrote:
One more thing, 64-bit multiplications do not sound very attractive for a 32-bit CPU/OS.

Andrew

Hi Andrew,

yes, 64-bit imul is an expensive call with up to three 32-bit imuls, iirc. For K8 32-bit mode it is quite effective, to do fill- and rotate-multiplication 32-bit wise. 32-bit imul is 3-cycles direct path, so similar expensive as a read from L1-cache.

Therefor i finally updated the "Compact Bitboard Attacks" thread with a complete version intended for 32-bit of the 0.5+1KByte lookup approach. The computation to get the six-bit occupied-state per ray by fill- or rotate-mul can be separated and may be combined with classical lookup approaches as well.

Gerd

by **Lasse Hansen** » 04 Aug 2006, 08:33

Andrew Fan wrote:
Pradu wrote:One more thing, 64-bit multiplications do not sound very attractive for a 32-bit CPU/OS.

It is possible to use 32 bit 'magic' number for hashing too, but I guess by using 64 bit numbers one may hash larger data sets. Here's an example of making a perfect hash for FirstOne() using 32 bit key on 32 bit numbers.

Code: Select all: #include <stdio.h> typedef unsigned int U32; #define TABLE_BITS 5 // means 32 elements int FirstOne[1<<TABLE_BITS]; // this is the hash table I want to make U32 MagicNumber; int main() { U32 Address, Pattern, N=0, SetBit[32], Random32(); int i,success,n,best=0; SetBit[0]=1; // constructing the set of elements I want to hash for (i=1;i<32;i++) SetBit[i]=SetBit[i-1]<<1; do { // searching for a MagicNumber that fits my needs for (i=0;i<(1<<TABLE_BITS);i++) FirstOne[i]=-1; MagicNumber=Random32(); success=1; n=0; for (i=0;i<32;i++) { // trying all elements Pattern = SetBit[i]; Address = (Pattern*MagicNumber)>>(32-TABLE_BITS); if (FirstOne[Address] != -1) { success=0; if (n>best) { best=n; printf("best=%d\n",best);fflush(stdout); } break; } else { FirstOne[Address] = i; n++; } } N++; if (!(N&((1<<24)-1))) { printf("."); fflush(stdout); } } while (!success); printf("MagicNumber 0x%x found in %u iterations\n",MagicNumber,(unsigned int)N); for (i=0;i<32;i++) { // testing the result Pattern = SetBit[i]; Address = (Pattern*MagicNumber)>>(32-TABLE_BITS); printf("i=%d FirstOne[%d]=%d\n",i,(int)Address,FirstOne[Address]); } return 0; } // The following is taken from crafty /* A 32 bit random number generator. An implementation in C of the algorithm given by Knuth, the art of computer programming, vol. 2, pp. 26-27. We use e=32, so we have to evaluate y(n) = y(n - 24) + y(n - 55) mod 2^32, which is implicitly done by unsigned arithmetic. */ unsigned int Random32(void) { /* random numbers from Mathematica 2.0. SeedRandom = 1; Table[Random[Integer, {0, 2^32 - 1}] */ static const unsigned long x[55] = { 1410651636UL, 3012776752UL, 3497475623UL, 2892145026UL, 1571949714UL, 3253082284UL, 3489895018UL, 387949491UL, 2597396737UL, 1981903553UL, 3160251843UL, 129444464UL, 1851443344UL, 4156445905UL, 224604922UL, 1455067070UL, 3953493484UL, 1460937157UL, 2528362617UL, 317430674UL, 3229354360UL, 117491133UL, 832845075UL, 1961600170UL, 1321557429UL, 747750121UL, 545747446UL, 810476036UL, 503334515UL, 4088144633UL, 2824216555UL, 3738252341UL, 3493754131UL, 3672533954UL, 29494241UL, 1180928407UL, 4213624418UL, 33062851UL, 3221315737UL, 1145213552UL, 2957984897UL, 4078668503UL, 2262661702UL, 65478801UL, 2527208841UL, 1960622036UL, 315685891UL, 1196037864UL, 804614524UL, 1421733266UL, 2017105031UL, 3882325900UL, 810735053UL, 384606609UL, 2393861397UL }; static int init = 1; static unsigned long y[55]; static int j, k; unsigned long ul; if (init) { int i; init = 0; for (i = 0; i < 55; i++) y[i] = x[i]; j = 24 - 1; k = 55 - 1; } ul = (y[k] += y[j]); if (--j < 0) j = 55 - 1; if (--k < 0) k = 55 - 1; return ((unsigned int) ul); }

The magic number 0xe89b2be is found in ~1.2M iterations. 32 bit keys should easily apply to the hashing done in this thread, because of the small data sets.

Lasse

by **Lasse Hansen** » 04 Aug 2006, 09:04

Hm, maybe my last post were unclear.. Well, here's the prototype for FirstOne32() with example.

Code: Select all: #include <stdio.h> int MagicTable[32]={ 31, 0, 9, 1,10,20,13, 2, 7,11,21,23,17,14, 3,25, 30, 8,19,12, 6,22,16,24, 29,18, 5,15,28, 4,27,26 }; int FirstOne32(unsigned int x) { return MagicTable[((x&-x)*0xe89b2be)>>27]; } int main() { int i; unsigned int SetBit[32]; SetBit[0]=1; for (i=1;i<32;i++) SetBit[i]=SetBit[i-1]<<1; for (i=0;i<32;i++) printf("i=%d %d\n",i,FirstOne32(SetBit[i])); return 0; }

by **Lasse Hansen** » 04 Aug 2006, 15:17

Edited: (it's real hot here ...)
Would'nt this be pretty fast on 32 bit machines?

Code: Select all: int MagicTable[32]={ 31, 0, 9, 1,10,20,13, 2, 7,11,21,23,17,14, 3,25, 30, 8,19,12, 6,22,16,24, 29,18, 5,15,28, 4,27,26 }; int FirstOne2(x) // Assumes 1 and only 1 bit set in x { return MagicTable[(x*0xe89b2be)>>27]; } A=AttackBoardLo; // Lower DWord of bitboard while (A) { To=FirstOne2(B=A&-A); *M++=<whatever>; A ^= B; } A=AttackBoardHi; // Higher DWord of bitboard while (A) { To=FirstOne2(B=A&-A)+32; *M++=<whatever>; A ^= B; }

by **Leen Ammeraal** » 05 Aug 2006, 06:58

Lasse Hansen wrote:Edited: (it's real hot here ...)
Would'nt this be pretty fast on 32 bit machines?
...

On the basis of the above code of Lasse Hansen, I wrote the
following function for getting the position of the least significant
1-bit of a (nonzero) bitboard. On my system (Athlon 64, 3000+,
Windows XP, VC++2005) and for my chess engine (Queen) this
function was faster than all others that I tried in the past. It was
even slightly faster than a similar one based on code by Gerd
Isenberg.
Unfortunately, when making my engine, say, 2% faster, I do not
observe that it gets any stronger in real play.
Anyway, thanks, Lasse!
Leen Ammeraal

Code: Select all: // Based on code by Lasse Hansen: const int MagicTable[32]= { 31, 0, 9, 1,10,20,13, 2, 7,11,21,23,17,14, 3,25, 30, 8,19,12, 6,22,16,24, 29,18, 5,15,28, 4,27,26 }; inline int getlsbit(unsigned __int64 bb) // bb != 0 { unsigned a = (unsigned)bb; return a ? MagicTable[((a & -(int)a) * 0xe89b2be) >> 27] : (a = (unsigned)(bb >> 32), MagicTable[((a & -(int)a) * 0xe89b2be) >> 27] + 32); }

by **Lasse Hansen** » 05 Aug 2006, 08:38

Nice that it was fast!

But my main point of my last post was that it does not seem necessary to use a 64 bit FirstOne function always. At least not when extracting moves in the move generator most inner loops for sliding pieces. Everything can be done using 32 bit numbers, and doubling code. Or, if one dont want to duplicate the code, maybe this would work:

Code: Select all: int MagicTable[32]={ 31, 0, 9, 1,10,20,13, 2, 7,11,21,23,17,14, 3,25, 30, 8,19,12, 6,22,16,24, 29,18, 5,15,28, 4,27,26 }; U64 AttackTable; U32 *A,B,temp,temp2,To; Make AttackTable; // 64 bit bitboard // __Pure 32 bit from here__ temp=From|(Piece<<16); temp2=0; A=&LODWORD(AttackTable); do { while (*A) { To=temp2+MagicTable[((B=*A&-*A)*0xe89b2be)>>27]; *M++=temp|(To<<6)... *A ^= B; // Clear bit } temp2=32; A=&HIDWORD(AttackTable); while (*A);

Lasse

by **Gerd Isenberg** » 05 Aug 2006, 08:51

Leen Ammeraal wrote:On the basis of the above code of Lasse Hansen, I wrote the
following function for getting the position of the least significant
1-bit of a (nonzero) bitboard. On my system (Athlon 64, 3000+,
Windows XP, VC++2005) and for my chess engine (Queen) this

function was faster than all others that I tried in the past. It was
even slightly faster than a similar one based on code by Gerd
Isenberg.
Unfortunately, when making my engine, say, 2% faster, I do not
observe that it gets any stronger in real play.
Anyway, thanks, Lasse!
Leen Ammeraal

Wow - a conditional 32-bit De Bruijn multiplication is faster than Matt Taylor's folding approach! I wouldn't have expected it.

Code: Select all: /** * @author Matt Taylor * @return index 0..63 * @param bb a 64-bit word to bitscan, should not be zero */ static int foldedTable[64] = { 63,30, 3,32,59,14,11,33, 60,24,50, 9,55,19,21,34, 61,29, 2,53,51,23,41,18, 56,28, 1,43,46,27, 0,35, 62,31,58, 4, 5,49,54, 6, 15,52,12,40, 7,42,45,16, 25,57,48,13,10,39, 8,44, 20,47,38,22,17,37,36,26, }; int bitScanForward(u64 b) { b ^= (b - 1); unsigned int folded = ((int)b) ^ ((int)(b >> 32)); return foldedTable[(folded * 0x78291ACF) >> 26]; }

Yes, the register pressure is higher in Matt's routine, due to the
b ^= (b - 1) statement which requires four 32-bit registers, to finally get the folded 32-bit word into one register. You need only one register.

Additionally, the size of the lookup-table requires four cachelines instead of two in your conditional routine, probably negligible, but one point comes to the other.

A small drawback with your routine - but of course empirical prove is right - you might have misspredictions up and then and even with very good predicted branches, each conditional jump of each inlined getlsbit incarnation, requires a btb-entry, which is a shared processor ressource and might pollute if you exceed some threshold with number of branches. The cmov-solution otoh doesn't look so attractive as well considering the registers used.

I will try your routine today in my 32-bit program and will report later, until now Matt's routine was the fastest bitscan on my amd64 box.

Btw. even the MIT guys didn't make optimal like you, with their HALF-DeBruijn in their paper, they first extract the ls1b 64-bit wise before they branch for a 32-bit multiplication.

Using de Bruijn Sequences to Index a 1 in a Computer Word
http://supertech.csail.mit.edu/papers/debruijn.pdf
Page 5

"After extracting a single 1 bit, we determine which half-word contains the 1, and the use the 32-bit DeBruijn algorithm on that half word."

Thanks,
Gerd

by **Gerd Isenberg** » 05 Aug 2006, 09:17

Hi Lasse,

yes, from the classical loop hoisting point of view, to keep conditions outside loop bodies, you are right. Duplicate 32-bit code to traverse low and high half-bitboards is indeed the fastest approach for 32-bit mode and was discussed a lot in former CCC already years ago.

The drawback - a lot of source code becomes 32/64-bit dependent - not only a few central and easy to maintain routines from some machine dependent headers.

If you manage maintability with some clever traverse-bitboard macros or inline routines with temp2-integer template parameter - why not. But 64-bit more and more becomes mainstream, so most don't care so much anymore on 32-bit, but are happy with a huge 64-bit mode improvement like Vas with Rybka ;-)

Cheers,
Gerd

by **Tord Romstad** » 05 Aug 2006, 09:35

Hello guys,

Although Glaurung currently doesn't use bitboards, I follow these threads with interest. I have made numerous experiments with bitboards over the last year, but so far I have never been happy with the results. My current mailbox board representation seems to be as fast or faster than rotated bitboards for everything I do (this is on a 32-bit CPU). How do the various non-rotated bitboard techniques compare to rotated bitboards on 32-bit computers?

At the moment, my main problems with bitboards are the following:

Most of the research seems to concentrate on the problem of quickly generating a bitboard of all squares attacked by a sliding piece. This looks nice in theory, but in practise I am almost never interested in finding all the squares a piece attack. Most often, what I want to know is simply whether the piece on square x attacks square y. I still haven't found a way to do this with bitboards which is as fast as my current mailbox code (which consists of a table lookup in a 512-byte table followed by a very short and simple loop). What's the best way to do this with bitboards (rotated or not)?
I often have difficulties writing generic code for all piece types with bitboards. I end up writing separate code for each piece type (and sometimes even colour), which makes it difficult to achieve the terse and minimalistic programming style I prefer.
I detest magic and clever tricks, which annoyingly often seem to be necessary to make bitboards fast.

Tord

by **Gerd Isenberg** » 05 Aug 2006, 11:09

Tord Romstad wrote:Hello guys,

Although Glaurung currently doesn't use bitboards, I follow these threads with interest. I have made numerous experiments with bitboards over the last year, but so far I have never been happy with the results. My current mailbox board representation seems to be as fast or faster than rotated bitboards for everything I do (this is on a 32-bit CPU). How do the various non-rotated bitboard techniques compare to rotated bitboards on 32-bit computers?

At the moment, my main problems with bitboards are the following:
Most of the research seems to concentrate on the problem of quickly generating a bitboard of all squares attacked by a sliding piece. This looks nice in theory, but in practise I am almost never interested in finding all the squares a piece attack. Most often, what I want to know is simply whether the piece on square x attacks square y. I still haven't found a way to do this with bitboards which is as fast as my current mailbox code (which consists of a table lookup in a 512-byte table followed by a very short and simple loop). What's the best way to do this with bitboards (rotated or not)?
I often have difficulties writing generic code for all piece types with bitboards. I end up writing separate code for each piece type (and sometimes even colour), which makes it difficult to achieve the terse and minimalistic programming style I prefer.
I detest magic and clever tricks, which annoyingly often seem to be necessary to make bitboards fast.
Tord

Hi Tord,

you address all the drawbacks with bitboards.
With bitboards you have different code for every piece-kind, since you don't use precalculated nextsquare,nextdir tables to generalize the attack/move abilities of all the pieces. So we do more switches by piece, or indexed jumps - and traverse pieces piecewise. If you like, a kind of loop-hoisting with the immanent knowledge inside that code, what kind of piece we are processing.

On answering the question, for instance does the bishop on b2 attacks g7? Ok, you may use an awfull table[b2][g7]-lookup with precalculated sets of the squares "between"[sq1][sq2], to do a intersetion with the occupied board.

The other option is to call the diagonal attack getter here, either of b2 or g7. Also in the fillbased world on would call one Kogge-Stone occluded fill. If you keep all directions attacks in L1 (shared by all nodes) for some time per node to get pinned pieces, remove checkers, all attacks for SEE and SME information, for eval as well for move-generation, those questions are easy to answer.

I still believe the bitboard future is fillbased, almost branchless pure register calculations for several pieces and probably directions and branchless serialisation of directed attack sets. Otherwise, with our classical approaches, we will rarely peak the maximum ipc on future processors, but stay with almost 1.something instructions per cycle with all those branches and stalls.

In my current 32-bit program, dominated by mmx-fillstuff, i use the ressource friendly 1.5Byte approach additionally on the fly - since i like to waste my cachelines elsewhere ;-)

Cheers,
Gerd

by **Tord Romstad** » 05 Aug 2006, 12:48

Hello Gerd,

Thanks for your reply.

Gerd Isenberg wrote:Hi Tord,

you address all the drawbacks with bitboards.
With bitboards you have different code for every piece-kind, since you don't use precalculated nextsquare,nextdir tables to generalize the attack/move abilities of all the pieces. So we do more switches by piece, or indexed jumps - and traverse pieces piecewise. If you like, a kind of loop-hoisting with the immanent knowledge inside that code, what kind of piece we are processing.

Yes, that's what I usually end up doing (BTW I don't use nextsquare/nextdir tables in my mailbox program, but that's beside the point).

On answering the question, for instance does the bishop on b2 attacks g7? Ok, you may use an awfull table[b2][g7]-lookup with precalculated sets of the squares "between"[sq1][sq2], to do a intersetion with the occupied board.

The other option is to call the diagonal attack getter here, either of b2 or g7.

I have tried both of these, but neither seem to work very well. I currently use the second. The code is similar to this (this is not actual code from my program, so bugs are possible):

Code: Select all: bool piece_attacks_square(const position_t *pos, int from, int to) { int piece = pos->board[from]; int direction; bitboard_t b = SetMaskBB(to); switch(TypeOfPiece(piece)) { case PAWN: case KNIGHT: case KING: if(StepAttackBB[piece][from] & b) return true; else return false; case BISHOP: direction = SlideDirection[from][to]; if(IsBishopSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false; case ROOK: direction = SlideDirection[from][to]; if(IsRookSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false; case QUEEN: direction = SlideDirection[from][to]; if(IsQueenSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false; } return false; }

The corresponding non-bitboard code would look like this (again, bugs are possible):

Code: Select all: bool piece_attacks_square(const position_t *pos, int from, int to) { attack_data_t *a = AttackData - to; int piece = pos->board[from], step, sq; if(a[from].may_attack & PieceMask[piece]) { if(PieceIsSlider(piece)) { step = a[from].step; for(sq = from + step; pos->board[sq] == EMPTY && sq != to; sq += step); if(to == sq) return true; } else return true; } return false; }

Also in the fillbased world on would call one Kogge-Stone occluded fill. If you keep all directions attacks in L1 (shared by all nodes) for some time per node to get pinned pieces, remove checkers, all attacks for SEE and SME information, for eval as well for move-generation, those questions are easy to answer.

Clearly I need to study this Kogge-Stone fill stuff better.

Tord

by **Sune Fischer** » 05 Aug 2006, 13:29

Tord Romstad wrote:
Code: Select all
bool piece_attacks_square(const position_t *pos, int from, int to) { int piece = pos->board[from]; int direction; bitboard_t b = SetMaskBB(to); switch(TypeOfPiece(piece)) { case PAWN: case KNIGHT: case KING: if(StepAttackBB[piece][from] & b) return true; else return false; case BISHOP: direction = SlideDirection[from][to]; if(IsBishopSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false; case ROOK: direction = SlideDirection[from][to]; if(IsRookSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false; case QUEEN: direction = SlideDirection[from][to]; if(IsQueenSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false; } return false; }

The corresponding non-bitboard code would look like this (again, bugs are possible):
Code: Select all
bool piece_attacks_square(const position_t *pos, int from, int to) { attack_data_t *a = AttackData - to; int piece = pos->board[from], step, sq; if(a[from].may_attack & PieceMask[piece]) { if(PieceIsSlider(piece)) { step = a[from].step; for(sq = from + step; pos->board[sq] == EMPTY && sq != to; sq += step); if(to == sq) return true; } else return true; } return false; }

Not sure I follow your bitboard implementation...?

Why not do the simple way:

Code: Select all: bool piece_attacks_square(const position_t *pos, int from, int to) { int piece = pos->board[from]; bitboard_t b = SetMaskBB(to); switch(TypeOfPiece(piece)) { case PAWN: return (b & PawnAttacks(side, from)) != 0; case KNIGHT: return (b & KnightAttacks(from)) != 0; case KING: return (b & KingAttacks(from)) != 0; case BISHOP: return (b & BishopAttacks(from)) != 0; case ROOK: return (b & RookAttacks(from)) != 0; case QUEEN: return (b & QueenAttacks(from)) != 0; } return false; }

It should be comparable in simplicity to the loop method (if not simpler)

You can see a for-loop has been replaced by a lookup table
(the macros "RookAttacks()" would serve to index this table).

But the real strength in bitboards is that you can generate the opposite attacks!

Instead of asking the question:
"Is this piece on square x attacking square y?"
you should ask
"Is this square attacked by any pieces of this type?"

-S

by **Tord Romstad** » 05 Aug 2006, 14:03

Sune Fischer wrote:
Tord Romstad wrote:
Code: Select all
bool piece_attacks_square(const position_t *pos, int from, int to) { int piece = pos->board[from]; int direction; bitboard_t b = SetMaskBB(to); switch(TypeOfPiece(piece)) { case PAWN: case KNIGHT: case KING: if(StepAttackBB[piece][from] & b) return true; else return false; case BISHOP: direction = SlideDirection[from][to]; if(IsBishopSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false; case ROOK: direction = SlideDirection[from][to]; if(IsRookSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false; case QUEEN: direction = SlideDirection[from][to]; if(IsQueenSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false; } return false; }

The corresponding non-bitboard code would look like this (again, bugs are possible):
Code: Select all
bool piece_attacks_square(const position_t *pos, int from, int to) { attack_data_t *a = AttackData - to; int piece = pos->board[from], step, sq; if(a[from].may_attack & PieceMask[piece]) { if(PieceIsSlider(piece)) { step = a[from].step; for(sq = from + step; pos->board[sq] == EMPTY && sq != to; sq += step); if(to == sq) return true; } else return true; } return false; }

Not sure I follow your bitboard implementation...?

Why not do the simple way:

Code: Select all
bool piece_attacks_square(const position_t *pos, int from, int to) { int piece = pos->board[from]; bitboard_t b = SetMaskBB(to); switch(TypeOfPiece(piece)) { case PAWN: return (b & PawnAttacks(side, from)) != 0; case KNIGHT: return (b & KnightAttacks(from)) != 0; case KING: return (b & KingAttacks(from)) != 0; case BISHOP: return (b & BishopAttacks(from)) != 0; case ROOK: return (b & RookAttacks(from)) != 0; case QUEEN: return (b & QueenAttacks(from)) != 0; } return false; }

It should be comparable in simplicity to the loop method (if not simpler)

This is simpler, but much slower than my bitboard code above. Your code is essentially the same as mine, except that I try not to generate any unnecessary attacks. For the sliding pieces, I generate attacks in at most one direction, in order to save some effort. Consider the QUEEN case of my code:

Code: Select all: case QUEEN: direction = SlideDirection[from][to]; if(IsQueenSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false;

If I am lucky, I can avoid generating any attacks altogether, because IsQueenSlidingDirection(direction) returns false. And even if it does return true, I can generate sliding attacks only along this single direction, instead of generating attacks in all four directions and ORing them together.

This was part of the point I was trying to make. Bitboards are probably neat if I want to quickly find all the squares a queen attacks. Unfortunately, I almost never want this.

But the real strength in bitboards is that you can generate the opposite attacks!

Instead of asking the question:
"Is this piece on square x attacking square y?"
you should ask
"Is this square attacked by any pieces of this type?"

That depends on what I want to know. If I want to know whether a given move checks the opponent, for instance, I can't easily replace the first question with the second (without losing lots of speed).

But even for answering your second question, I have difficulties making bitboards faster than mailbox. In my program, the code would look like this with bitboards:

Code: Select all: bool square_is_attacked_by_piece_of_type(const position_t *pos, int sq, int piece) { bitboard_t b = PieceBB(pos, piece); switch(piece) { case WP: return BlackPawnAttackBB(pos, sq) & b; case WN: return KnightAttackBB(pos, sq) & b; case WB: return BishopAttackBB(pos, sq) & b; // .... } return false; }

With mailbox, it would look like this:

Code: Select all: bool square_is_attacked_by_piece_of_type(const position_t *pos, int sq, int piece) { int from, to, step, mask = PieceMask[piece]; attack_data_t *a = AttackData - sq; for(from=PieceListStart(pos, piece); from<=H8; from=NextPiece(pos, from)) if(a[from].may_attack & mask) { if(PieceIsSlider(piece)) { step = a[from].step; for(to = from + step; pos->board[to] == EMPTY && to != sq; to += step); if(to == sq) return true; } else return true; } return false; }

In my experience, the bitboard code is slightly faster for non-sliding pieces, while the mailbox code is slightly faster for sliding pieces. Overall it doesn't seem to make any noticable difference. My SEE, for instance, runs slightly faster with mailbox than with bitboards. Finding the attackers is roughly equally expensive with both methods, but extracting the nonzero bits from the attack bitboards takes some time.

I should once again point out that I use a 32-bit CPU. Perhaps everything would have been very different on 64-bit.

Tord

by **Sune Fischer** » 05 Aug 2006, 15:11

Tord Romstad wrote:This is simpler, but much slower than my bitboard code above. Your code is essentially the same as mine, except that I try not to generate any unnecessary attacks. For the sliding pieces, I generate attacks in at most one direction, in order to save some effort. Consider the QUEEN case of my code:
Code: Select all
case QUEEN: direction = SlideDirection[from][to]; if(IsQueenSlidingDirection(direction) && (SlidingAttackBB(pos, from, direction) & b)) return true; else return false;

If I am lucky, I can avoid generating any attacks altogether, because IsQueenSlidingDirection(direction) returns false. And even if it does return true, I can generate sliding attacks only along this single direction, instead of generating attacks in all four directions and ORing them together.

Really? How much slower is it?

I really don't like all the branches and tables and complexity it introduces.

That's one of the reasons bitboards are so simple, you don't need to worry about directions in this way.

Tord Romstad wrote:That depends on what I want to know. If I want to know whether a given move checks the opponent, for instance, I can't easily replace the first question with the second (without losing lots of speed).

Heh of course, it doesn't apply to all situations

But I think the most common question would be, for instance
"Are there any knights attacking this square?"

And now it applies, because instead of looping through your
knight piece list and checking one by one if they attack, you can
answer the question by a simple attack lookup and a mask.

Another very frequent thing to ask is:
"does this piece attack any of those squares?"

Again very simple to answer with bitboards, but how do you do this fast without bitboards?

Generally you might say, that bitboards have the edge whenever
you are dealing with a set of squares or pieces instead of just one.

All your examples are related to square x versus square y,
so yes it is tough for bitboards to compete speed wise.

Do the whole program though and it might not be so slow..

Tord Romstad wrote:I should once again point out that I use a 32-bit CPU. Perhaps everything would have been very different on 64-bit.
Tord

You bring up an interesting point

-S

by **Gerd Isenberg** » 05 Aug 2006, 18:28

Gerd Isenberg wrote:I will try your routine today in my 32-bit program and will report later, until now Matt's routine was the fastest bitscan on my amd64 box.

I tried it - unclear. Register pressure is lower, it takes two registers (not one of course) instead of four. But may be we need two other scratch regsiters inside the loop anyway. It depends, there are positions, where it is slightly faster, there are some where it is slower. My loop bodies are usually quite small, so that a loop is likely register stuff while pushing moves on a stack.

Gerd

by **Leen Ammeraal** » 05 Aug 2006, 19:06

Gerd, I tried your version (which is different from what I used
before Lasse's routine). It took me a while to get it right, because
of the sign extension when right-shifting type int on my compiler,
so I inserted the cast to unsigned below in your program line

Code: Select all: return foldedTable[(unsigned)(folded * 0x78291ACF) >> 26];

to get it right. It was indeed slightly slower, as I reported earlier.
This is the way I tested. I tested two versions of my engine with
epd2wb for a given position (wac163) and looked how long it took
to reach a certain depth (17).
I always do this test several times, to eliminate 'noise',
and I also check if the number of search tree nodes are exactly the
same for both program versions.
This was typically 26800 ms with your routine and 25700 ms
with the one I wrote (based on Lasse's function).
Leen

Winboard Forum

Pretty fast compact bitboard move generator

Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Re: Pretty fast compact bitboard move generator

Who is online