Gerd,
Can you also show us how to avoid bitboards being factor 3 times slower or so when we want to use more chessknowledge than just a simple summation count nor when we want a simplistic vector multiply.
Just simple logics for each square also using attacktables and especially how many times a square gets attacked.
A crucial thing all humans always take into account.
Note also the real old gnuchess 4.0 (non bitboards) allows using this.
Also Ed's datastructure allows it more or less in his 8 bits attacktable datastructure.
Please show me bitboards and in a generic code that also works at other processors than opteron.
I'm not using SSE2 assembly code in DIEP because i hate to be dependant upon running only with a certain windows compiler at A64's.
Diep also runs under linux and at other processors than A64's.
Please show something generic that also works at other processors than A64.
And now the advantages for bitboards that I can see:
(a) Faster evaluation. Calculating whether a pawn is passed or not requires just one and.
Difficult to read evaluation in bitboards. Every bitmask you must check and recheck 10 times. Calculation of any complex loop is dead slow.
Slow vector instructions just kill your processors IPC.
Slower evaluation than is possible in non-bitboards because you only have 1 bit of information. To do any sort of 'count' is very hard.
Try to get the total count out of 8 different bitboards at position square_c3.
In my "straightforward' datastructure i just do count = x[i]&15;
And i have that number.
(b) No need to mess around with piece lists and such, bitboards already do the job well.
In bitboards you need to write out for black and white everything, see crafty code. If you argue that's not needed to do in bitboards, then please lift your arm high in the sky indicating us whether you wrote generic code that is both working for both sides and several pieces in a generic way.
What qualifies is if you use this in the searching engine :
BITBOARD piecelist[2][6];
(c) No need to scan the empty square when generating captures.
In short all you can do in bitboards is captures?
Excuse me, if there is 40 semi legal moves, i want to generate *all* of them. There might be some move inside that raises my score a lot.
This where the lightning fast speed to scan squares gets used for my mobility, activity, scans throughout the evaluation (which eats all system time). That code is 2.2 times faster than 32 bits bitboards and like 1.5 to 64 bits bitboards.
(d) No need to test for the end of the board.
With precomputed tables there is no need to do that either to generate.
(e) Future proof: Bitboards will have a great speed advantage when all the PCs become 64 bit!
As you can prove, 32 bits to 64 bits move speeds you up around 30% (Crafty). So instead of a datastructure being factor 2.2 times slower and completely incapable to handle what todays strong programs need; that's easy datastructure that is fast and allows for little bugs
(f) bitboard programs loaded with bugs because of mistakes in conversion between 32 and 64 bits.
(g) very difficult to add more knowledge in bitboards. only the KISS principle gets used for knowledge. Complex knowledge you won't find there.
(h) the Hyatt argument, bitboards are very useful when you don't have a fast L1 cache in your processor and then search for a solution to avoid array lookups as those go to straight memory.
Good news for you, your processor DOES have a L1 cache, and a FAST one !!
(i) As a bitboarder if you lose a game, you can complain it is not your fault but from the compiler builder
Depending on the design. For rotated bitboards not that much between x86-32 versus x86-64 - may be also due to compiler are stil too "new" and the compiler guys have a lot to improve. For fill based bitboards as
Of course, compilers are made by machines who are 100% bugfree coders. In reality it is some stupidity either from you or from the compiler guysf in 32/64 bits. It doesn't matter who made the mistake. What matters is that you have to write code that works bugfree. Generic code usually is doing that. If you lose because of a compiler bug or because of some dumb casting bug you overlooked, that doesn't matter for your opponent. He'll take the point.
(i) in bitboards in the year 2005 you are still busy with hexadecimal values. Now not 8 bits long like they were in the 80s, but a confusing 64 bits in bitboards.
If you write in a normal datastructure some chessknowledge, it requires relative easy debugging to get the thing working correct. In bitboards you really must recheck things 10 times. Masks look real funny you know:
0x000290000000001000
Do you know whether the above has the right mask selected for your bitboard pattern?
Easy checking in bitboards!
In Non-bitboards things are just so much easier to write generic and if you know the coordinate system in chess pretty well, like where a2 is and that a2 != a3, then the odds for mistakes there is far smaller. If there is a problem it's identified quicker!
(j) As a bitboarder in the year 2005 you are still busy with (in)line assembly
Way to go. How to *ever* expand your program when all you are busy with is finding a clever way to do some utmost tiny thing 1 cycle faster in SSE2?
(k) As a bitboarder you just know for sure that you'll face opponents running at more processors than you do. Majority of parallel engines are *not* bitboard engines. How many parallel bitboard engines are there actually?
In order of strength: Zappa, Crafty, ... ?
Imho important is the immanent parallelism by working with sets.
Faster than being busy with parallellism within the processor is parallellism between processors!
Brings you factor 3.x speedup at a dual opteron dual core.
(l) the Frans Morsch passer argument
At how many spots in your eval do you need to know whether something is a passer. If more than 2, why not keep a bitboards array containing all the passers you got?
BITBOARD passers[2];
Real easy.
Hey, that's even faster than calculating them!
Let me quote again why i definitely do NOT want to think about dicking with bitboards, as i like to be busy with a bit more highlevel things that contain more chessknowledge than a vector multiply gives:
int dotProduct64(BitBoard bb, BYTE weights[] /* XMM_ALIGN */)
{
static const BitBoard XMM_ALIGN sbitmask[2] = {
0x8040201008040201,
0x8040201008040201
};
__m128i x0, x1, x2, x3, bm;
bm = _mm_load_si128 ( (__m128i*) sbitmask);
x0 = _mm_loadl_epi64 ( (__m128i*) &bb);
// extend bits to bytes
x0 = _mm_unpacklo_epi8 (x0, x0);
x2 = _mm_unpackhi_epi16 (x0, x0);
x0 = _mm_unpacklo_epi16 (x0, x0);
x1 = _mm_unpackhi_epi32 (x0, x0);
x0 = _mm_unpacklo_epi32 (x0, x0);
x3 = _mm_unpackhi_epi32 (x2, x2);
x2 = _mm_unpacklo_epi32 (x2, x2);
x0 = _mm_cmpeq_epi8 (_mm_and_si128 (x0, bm), bm);
x1 = _mm_cmpeq_epi8 (_mm_and_si128 (x1, bm), bm);
x2 = _mm_cmpeq_epi8 (_mm_and_si128 (x2, bm), bm);
x3 = _mm_cmpeq_epi8 (_mm_and_si128 (x3, bm), bm);
// multiply by "and" with -1 or 0
__m128i* pw = (__m128i*) weights;
x0 = _mm_and_si128 (x0, pw[0]);
x1 = _mm_and_si128 (x1, pw[1]);
x2 = _mm_and_si128 (x2, pw[2]);
x3 = _mm_and_si128 (x3, pw[3]);
// add all bytes (with saturation)
x0 = _mm_adds_epu8 (x0, x1);
x0 = _mm_adds_epu8 (x0, x2);
x0 = _mm_adds_epu8 (x0, x3);
x0 = _mm_sad_epu8 (x0, _mm_setzero_si128 ());
return _mm_extract_epi16 (x0, 0)
+ _mm_extract_epi16 (x0, 4);
}
- Gerd
Very clever the above Gerd, that you manage to do a vector operation in SSE2 of an array times a bitboard. Now the most important question is of course, does being busy for a full year with a few tiny optimizations also bring 50 rating points for Isichess?
It's obvious that each year chesssoftware has to improve in order to keep its status in the world top. Whether that's 50 ratingpoints or less is not relevant question here. But it is *some* sort of improvement.
What becomes world champion in 2003 won't win in 2004. What won in 2004 won't win in 2005. Progress is needed because competitors also improve each year.
How is bitboard engines going to manage that?
(w) Being bitboards makes it a real challenge to win the world title; after the 80s never a bitboards engine again won the world title
-----
Vincent