Tord Romstad wrote:It's been a very busy year, and only know, almost one year after I started this thread, I have started experimenting with bitboards on the ARM. The results so far are disappointing: When I use the magic bitboard implementation found in the public version of Glaurung, the speed drops by a factor of 100 on the 412 MHz ARM compared to when running on a single core of a 2.8 GHz Core 2 Duo. Most non-bitboard programs I have tried are only about 30-50 times slower on the ARM. Switching from magic bitboards to hyperbolic quintessence makes it even worse: The speed drops by about 20%. In 64-bit mode on the Core 2 Duo, on the other hand, HQ runs at almost exactly the same speed as magic bitboards.
Using the ARM's clz instruction rather than a standard folded bitscan for bitscanning does not improve performance measurably.
I now think the best approach on the ARM would be to get rid of the bitboards entirely.
Here is the code I use for sliding attack generation. If someone spots any obvious mistakes, please point them out:
Tord
Hi Tord,
seems that ARM1176 is an in order processor without any parallel decoding/execution. So slow execution that memory access becomes relative fast, even with huge tables (therefor the better magic performance). Additionally the rev-instruction may be slow - since it is usually not performance critical like bitwise boolean or arithmetical instructions and might therefor implemented with some internal macro-program. Are there any papers with instruction latencies?
Also 32-bit doesn't allow to keep stuff in registers a lot, thus 64-bit aka 2*32 bit really becomes a bottleneck due to additional use of locals on the stack rather than inside registers. May be the compiler is also weak in optimizing 64-bit stuff. May be you can provide some generated assembly of bishop_attacks_bb?
You may try to save one bswap, by either indexing with xor 56 or using one additional structure element with precalculated flipped bits. But I fear that will not help that much
- Code: Select all
inline Bitboard file_attacks_bb(Square s, Bitboard b) {
b &= SqData[s].fMask;
return ((b - SqData[s].bMask) ^ bswap(bswap(b) - SqData[s^56].bMask))
& SqData[s].fMask;
}
inline Bitboard file_attacks_bb(Square s, Bitboard b) {
b &= SqData[s].fMask;
return ((b - SqData[s].bMask) ^ bswap(bswap(b) - SqData[s].revbMask))
& SqData[s].fMask;
}
Have you tried to
not inline the piece attack getters?
Gerd