Moderator: Andres Valverde
I use a pop-count for mobility as well because like you said it'll be more accurate because you can modify the attacks bitboards and count different types of moves separately.Edsel Apostol wrote:I have tried it also using the kindergarten bitboards approach and based on my tests, I doubt if the speed-up compensates for the inaccuracy of computing the mobility. I am using now the one with popcount.
int moveList = rMagic(square);
for (i = 0; i < 32; i = i + 8) {
moves = (moveList >> i) & 255;
switch (i >> 3) {
case 0:
for (j = 1; j < 8; j++) {
if (moves & (1 << j))
storeMove(rook, from, from + j);
else
break;
}
break;
case 1:
for (j = 1; j < 8; j++) {
if (moves & (1 << j))
storeMove(rook, from, from - j);
else
break;
}
break;
case 2:
for (j = 1; j < 8; j++) {
if (moves & (1 << j))
storeMove(rook, from, from + (j << 3));
else
break;
}
break;
case 3:
for (j = 1; j < 8; j++) {
if (moves & (1 << j))
storeMove(rook, from, from - (j << 3));
else
break;
}
break;
}
}
file = from & 7;
rank = from >> 3;
for (j = file + 1; j < 8; j++) {
if (rankMoves & bit[j])
storeMove(rook, from, 8*rank + j);
else
break;
}
for (j = file - 1; j >= 0; j--) {
if (rankMoves & bit[j])
storeMove(rook, from, 8*rank + j);
else
break;
}
for (j = rank + 1; j < 8; j++) {
if (fileMoves & bit[j])
storeMove(rook, from, 8*j + file);
else
break;
}
for (j = rank - 1; j >= 0; j--) {
if (fileMoves & bit[j])
storeMove(rook, from, 8*j + file);
else
break;
}
Edsel Apostol wrote:The new version of Glaurung is already using a pre-computed table for computing mobility. It is using magic method for indexing. He is using the same index but accessing a different table for mobility.
Onno Garms wrote:The Glaurung website is down and this seems to last longer.
Edit: I found Glaurung source, no need to email.
http://packages.qa.debian.org/g/glaurung.html
Edit2: For sliders, Glaurung seems to count only non-captruring moves.
Tord Romstad wrote:Mobility evaluation seems to be one of the areas where bitboards are still behind other board representations.
Future processors will also have the SSE4 POPCNT(pdf-page 161) instruction. Apparently AMD's K10 will also have popcnt but I'm not too sure on this one.Tord Romstad wrote:Mobility evaluation seems to be one of the areas where bitboards are still behind other board representations.
Edsel Apostol wrote:Is the routine "bswap" included in the C libraries? If not, how am I going to access it. I can create a simple routine in C equivalent to this bswap but I doubt if it would be fast.
Edsel Apostol wrote:Hi Gerd,
Can you elaborate further with your method? I am searching for a faster way to evaluate mobility and it seems your implementation is always the best when it comes to anything related to bitboards.
This is a little out of topic, but I just want to insert it here as I don't want to create another thread. Is the routine "bswap" included in the C libraries? If not, how am I going to access it. I can create a simple routine in C equivalent to this bswap but I doubt if it would be fast.
int dotProductRotated(u64 bb, u8 *rorWeights)
{
static const u64 CACHE_ALIGN masks[8] =
{
0x0101010101010101, 0x0202020202020202,
0x0404040404040404, 0x0808080808080808,
0x1010101010101010, 0x2020202020202020,
0x4040404040404040, 0x8080808080808080,
};
__asm
{
movq xmm0, [bb]
lea edx, [masks]
mov eax, [rorWeights]
// extend bits to bytes
punpcklqdq xmm0, xmm0
pxor xmm4, xmm4 ; zero
movdqa xmm1, xmm0
movdqa xmm2, xmm0
movdqa xmm3, xmm0
pandn xmm0, [edx+0*16]
pandn xmm1, [edx+1*16]
pandn xmm2, [edx+2*16]
pandn xmm3, [edx+3*16]
pcmpeqb xmm0, xmm4
pcmpeqb xmm1, xmm4
pcmpeqb xmm2, xmm4
pcmpeqb xmm3, xmm4
// multiply by "and" with -1 or 0
pand xmm0, [eax+0*16]
pand xmm1, [eax+1*16]
pand xmm2, [eax+2*16]
pand xmm3, [eax+3*16]
// add all bytes (with saturation)
paddusb xmm0, xmm1
paddusb xmm0, xmm2
paddusb xmm0, xmm3
psadbw xmm0, xmm4 ; horizontal add 2 * 8 byte
pextrw edx, xmm0, 4 ; extract both intermediate sums to gp
movd eax, xmm0
add eax, edx ; final add
}
}
#ifdef ..VC
__forceinline u64 flip(u64 b) {return _byteswap_uint64(b);}
#else
inline u64 flip(u64 b) {return bswap_64(b);}
#endif
u64 diagonalAttacks(u64 occ, u32 sq) {
u64 forward, reverse;
forward = occ & smsk[sq].diagonalMaskEx;
reverse = flip(forward);
forward -= smsk[sq].bitMask;
reverse -= flip(smsk[sq].bitMask);
forward ^= flip(reverse);
forward &= smsk[sq].diagonalMaskEx;
return forward;
}
u64 antiDiagAttacks(u64 occ, u32 sq) {
u64 forward, reverse;
forward = occ & smsk[sq].antidiagMaskEx;
reverse = flip(forward);
forward -= smsk[sq].bitMask;
reverse -= flip(smsk[sq].bitMask);
forward ^= flip(reverse);
forward &= smsk[sq].antidiagMaskEx;
return forward;
}
u64 bishopAttacks(u64 occ, u32 sq) {
return diagonalAttacks (occ, sq)
+ antiDiagAttacks (occ, sq);
}
Gerd Isenberg wrote:Hi Tord,
how do other board representations compute mobility more efficiently?
Bitboard b = pos.rook_attacks(sq);
mob = count_1s(b & (pos.empty_squares() | pos.pieces_of_color(them)));
mob -= count_1s(b & pos.pawns_of_color(us));
My favorite technique is still based on bit[64]*byte[64] simd-dot-product. bit[64] is an masked attack-set (e.g. no opponent pawn-attacks and no immobile own pawns) and byte[64] is a weight matrix or attack-square-table for various pieces. One may even use an array of precalculated weight-matrices, indexed by king-placements and even by some pawn-structure enums from pawn-hash. Not to mention fill-stuff to get trapped pieces or progressive mobility.
Pradu wrote:Future processors will also have the SSE4 POPCNT(pdf-page 161) instruction.Tord Romstad wrote:Mobility evaluation seems to be one of the areas where bitboards are still behind other board representations.
I see - tiny loops versus getting attack-sets and popcount aka dot-product. And if the popcount is implemented loopwise - I agree about the efficiency. That is why a native popcnt instruction is so desired. In your sample - you really don't need two population counts, but only one with the "right" set (Edit: oups that is wrong of course, two disjount sets to count) - another idea is to count attacks of all bishops/knights with the odd/major trick.Tord Romstad wrote:With my old mailbox board representation, I simply looped over all attacked squares, while counting all empty squares and squares occupied by enemy pieces. At the end, I subtracted the number of attacked squares occupied by friendly pawns. This was quite fast. The best way I have found to compute this with bitboards is something like this:
- Code: Select all
Bitboard b = pos.rook_attacks(sq);
mob = count_1s(b & (pos.empty_squares() | pos.pieces_of_color(them)));
mob -= count_1s(b & pos.pawns_of_color(us));
This is very slow, because of the need for sliding attack generation and to calls to count_1s().
For that sort of stuff, bitboards may indeed be advantageous -- but I am too stupid to make effective use of such complicated evaluation terms (I know this from experience; I used to do things like what you describe back when my program was 500 Elo points weaker than today). I wouldn't use a very complicated mobility evaluation even if I could get it with zero CPU time. Keeping it simple and straightforward works best for me.
// we will compute mobility thru our own pieces but not thru our pawns
// and not thru king so trapped rook does not think it has some mobility
bitboard xrayblocker = pcon->pieces[PAWN|side] | pcon->pieces[KING|side];
// - Rook -
for (pc = pcon->pieces[ROOK|side]; pc; ) {
sqr = PopLSB(&pc);
moves = Rmagic(sqr, pcon->pieces[B_OCCU|xside]|xrayblocker) &~ xrayblocker;
// dynamicVal.attacks |= moves;
dynamicVal.c_move += popCount(moves);
// dynamicVal.c_offense += popCount(moves & offenseMap);
// dynamicVal.c_king += popCount(moves & kingSquare);
}
Harald Johnsen wrote:Do you really need the 2 popcount ?
HJ.
Return to Programming and Technical Discussions
Users browsing this forum: No registered users and 7 guests