Moderator: Andres Valverde
AttackRook = ((AllWpieces|AllBpieces)&RookMask[sq]) * RookMagicTable[sq];
Edsel,
i suggest before you post your proposals - to prove at least the simple case, where ((AllWpieces|AllBpieces)&RookMask[sq]) becomes zero. May be you need some additional xor with a supermagic
Gerd
occupied ^ (occupied -2*rooks)
Gerd Isenberg wrote:Hi Edsel,
no problem
The fact to consider with integer multiplication - there is no way to shift some bits right. ....
Gerd
newMask=(oldMask & ~bitsToShift)|((bitsToShift & oldMask) >>shiftAmount)
Lasse Hansen wrote:I have not focused on finding 2048 rook keys, so I havent found any new ones. Instead I have tried to improve move generation, and am currently implementing a faster check detection routine. The old routine is dead slow, because its testing -all- possible checks for -every- move.
One nice improvement I got by using an idea from Gerd, when using deBruin key/table for FirstOne, I use the (A&-A) result to clear the bit in the attack board. This gave a quite remarkable speed improvement for making quasi-legal moves.
Pos1:"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w QKqk -"
Pos2:"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w QKqk -"
Pos3:"q3k3/6q1/4q3/2q5/5Q2/3Q4/1Q6/3K3Q w - -"
Pos1: from 50Mnps to 71Mnps perft 6ply
Pos2: from 68Mnps to 86Mnps perft 5ply
Pos3: from 83Mnps to 124Mnps perft 5ply
Lasse
; input rdx - the target bitboard to serialize b
; ecx - the source square
; rdi - pointer to the move buffer
; direction flag
mov rax, rdx ; b
neg rax ; -b
jz done
shl ecx, 6 ; from << 6
lea ebx, [magicLookup]
mov rsi, 0x03... ; De Bruijn factor
align 16
tloop:
and rax, rdx ; b & -b
xor rdx, rax ; reset found bit
imul rax, rsi
shr rax, 58
mov eax, [rbx+4*rax] ; table lookup -> to square
add eax, ecx ; (from << 6) + to
mov [rdi], eax ; store 32-bit move *
mov rax, rdx ; b
add rdi, 4 ; pointer++ *
neg rax ; -b
jnz tloop
done:
Gerd Isenberg wrote:I guess your bitboard traversals are really a hotspot in your perft application, most critcal inner loops. How did you reset the bits before? Xor by Lookup with powOf2[foundIndex]?
Pcs=WRooks; // WRooks
while (Pcs) {
From=FirstOne2(C=Pcs&-Pcs);
Pattern = (AllWPieces | AllBPieces) & MaskRook[From];
Address = (Pattern*HashKeyRook[From])>>(64-RAT_BITS);
A = AllBPieces & RookAttacks[From][Address];
while (A) {
To=FirstOne2(B=A&-A);
if ((CaptPiece=Board[To])==BKing) return -1;
*M++=From+(To<<6)+(WRook<<12)+(CaptPiece<<16);
A &= ~B;
}
Pcs &= ~C;
}
Lasse Hansen wrote:I reset the bits using A&=ClrBit[square]; where Clrbit[.]=~SetBit[.]. So the cache have an easier time . Here's a code example:
- Code: Select all
Pcs=WRooks; // WRooks
while (Pcs) {
From=FirstOne2(C=Pcs&-Pcs);
Pattern = (AllWPieces | AllBPieces) & MaskRook[From];
Address = (Pattern*HashKeyRook[From])>>(64-RAT_BITS);
A = AllBPieces & RookAttacks[From][Address];
while (A) {
To=FirstOne2(B=A&-A);
if ((CaptPiece=Board[To])==BKing) return -1;
*M++=From+(To<<6)+(WRook<<12)+(CaptPiece<<16);
A &= ~B;
}
Pcs &= ~C;
}
I also modified FirstOne() to not do (A&-A). The assembly code generated by gcc with full optimisation is so dirty that I wont post it here.
Lasse
Pcs=WRooks; // WRooks
while (Pcs) {
From=FirstOne2(C=Pcs&-Pcs);
Pcs ^= C;
Pattern = (AllWPieces | AllBPieces) & MaskRook[From];
Address = (Pattern*HashKeyRook[From])>>(64-RAT_BITS);
A = AllBPieces & RookAttacks[From][Address];
if ( A & Kings ) return -1;
while (A) {
To=FirstOne2(B=A&-A);
A ^= B;
*M++=From+(To<<6)+(WRook<<12)+(Board[To]<<16);
}
}
Lasse Hansen wrote:I have tried those ideas Gerd gave.
1. Having an own Kings bitmap to check king captured outside inner loop improved the speed by ~1Mnps (quasi-legal).
2. Moving the bit-clearing to after FirstOne -redused- the speed by ~4Mnps.
3. Using XOR instead of &~ increased the speed by ~0.5 Mnps, but -only- in the capture move generator. When i tried in non-capture gen. it slowed down. It might have something to do with the inner loop code size (?).
Gcc acts strange..., even by adding comments to the source code I -might- get reduced speed.
Lasse
// pragma I LOVE GCC
Pcs = WRooks;
while ( Pcs ) {
A = Pcs;
Pcs &= Pcs-1; // reset found rook
From=FirstOne2(Pcs ^ A);
Pattern = (AllWPieces | AllBPieces) & MaskRook[From];
Address = (Pattern*HashKeyRook[From])>>(64-RAT_BITS);
A = AllBPieces & RookAttacks[From][Address];
if ( A & Kings ) return -1;
if ( A ) {
From += (WRook<<12);
do {
B = A;
A&= A-1;
To = FirstOne2(A ^ B);
*M++ = From + (To<<6) + (Board[To]<<16);
} while ( A );
}
}
Gerd Isenberg wrote:I see, already a symptom of cache pollution. Some profiler that reports L1-cache misses would be a nice tool.
Lasse Hansen wrote:I just had another idea of using hash keys to find moves. Instead of looking up an attack board, why not replace tha attack bitboard with a list of To-squares? A piece can never capture more than 8 pieces at once, and by using 6 bits per To-square, one will have 64-48=16 bits to tell how many genuine moves there are in the 64 bit word. This replaces the FirstOne with a shift+and. For non-captures, the queen might attack 27 squares, this could fit in 3x 64 bit words.
Lasse
queenPattern = allPieces & queenMask[queenSquare];
queenAddress = (queenPattern * queenMagic[queenSquare]) >> QUAT_BITS;
...
Return to Programming and Technical Discussions
Users browsing this forum: No registered users and 20 guests