Grant Osborne wrote:Hi all
For those of you that are using Pradu's magic move bitboard generator (or understand how it works) and have a 32-bit computer, you'll be using code that looks something like this to retrieve your attackboards:-
- Code: Select all
U64 Bmagic(unsigned int square) {
return *(bIndecies[square] + (((Occupied & bishopMask[square]) * bishopMagic[square]) >> bishopShift[square]));
}
U64 Rmagic(unsigned int square) {
return *(rIndecies[square] + (((Occupied & rookMask[square]) * rookMagic[square]) >> rookShift[square]));
}
I picked up on a recent post by H.G Muller and substituted the code above with:-
- Code: Select all
U64 Bmagic(unsigned int square) {
unsigned int lo = bishopMagicLo[square] * (int) Occupied;
unsigned int hi = bishopMagicHi[square] * (int) (Occupied>>32);
return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square]));
}
U64 Rmagic(unsigned int square) {
unsigned int lo = rookMagicLo[square] * (int) Occupied;
unsigned int hi = rookMagicHi[square] * (int) (Occupied>>32);
return *(rIndecies[square] + ((lo ^ hi) >> rookShift[square]));
}
This modification requires a new set of two 32-bit keys per square (or like me you could join them into one 64-bit key) and only uses two 32-bit multiplies instead of three, and a 32-bit shift, and is probably more suited to a 32-bit machine.
I have implemented this into my program and, for me, I got an increased perft speed of 1.5 million nps.
My thanks to H.G Muller
Grant
Hi Grant,
Nice demonstration and very good news for 32 bit hardware owners, especially for AMD.
However, and this is not a negative reply in any way, I just want to bring to light
that there was more happening in the other thread than just wether two 32 bit
multiplications was faster than three. There was also two points of contention as well.
The first point of contention was if shifting to create an index could be faster than
multiplication. All optimization text that I have read say that it is. It is better to
do multiple shifts and adds rather than one multiplication.
There is a problem with the the shifting method though, and that is for the bishops
(not the rooks) the index is not compact and uses more memory than the Lasse/Pradu
magic bitboards. This is solved by the second point of contension.
The second point of contention is if it is worth splitting the index so the total table
sizes will only be a very small fraction of what is needed by the Lasse/Pradu method and
then combining the results of two lookups.
My contention is that shifting is faster than multipication and that the smaller tables
will be very easy on the cash. Both methods combined IMHO will gain in speed over
Lasse/Pradu much more than the 10 to 20 percent that your modification has. Even for 64
bit I expect a substantial speedup.
Okay, "prove it" some may say! Well I am working to do just that. Just do not expect the
results anytime soon. I am single and have to do all the house work and yard work myself,
as well as earn a living and take care of aged parents and do all their yard work and some
of their house work. And I am not a very fast programmer to begin with. So, if anyone is
interested in having these answers before six months from now then I hope that someone
else could make some of these test.
Mike
Edit: I can present some subjective evidence.
Harald Luben made a test comparison between several bitboard methods at talkchess.com. Here are the results.
- Code: Select all
+----------------------------+-----------------------+--------------+-----------+
| Name | Inventor or Supporter | Table Size | Test-Time |
+----------------------------+-----------------------+--------------+-----------+
| Rotated Bitboards (naive) | Robert Hyatt | 77.2 KByte | 87.78 s |
| Rotated Bitboards (aligned)| ? | 35.0 KByte | 86.65 s |
| Rotated Bitboards (switch) | (Harald Lüßen?) | 14.9 KByte | 88.91 s |
| Rotated Indices | Alessandro Damiani | 128.3 KByte | 78.93 s |
| Exploding Bitboards | Harald Lüßen | 3.5 KByte | 101.79 s |
| Magic Multiplication | Gerd Isenberg | 9.7 KByte | 91.87 s |
| Magic Multiplication 32 bit| Gerd Isenberg | 9.7 KByte | 81.37 s |
| Sherwin Index | Michael Sherwin | 1402.4 KByte | 90.03 s |
| Pradu Minimize Magic | Pradyumna Kannan | 844.0 KByte | 81.42 s |
| Pradu Perfect Magic Hash | and | 627.4 KByte | 82.09 s |
| Pradu Big | Lasse Hansen | 2306.0 KByte | 82.33 s |
| Kogge-Stone | Steffan Westcott | 0.0 KByte | 99.32 s |
| Simple Shift | -- | 0.0 KByte | 110.25 s |
| Naive Shift | -- | 0.0 KByte | 111.69 s |
+----------------------------+-----------------------+--------------+-----------+
The 32 bit unfriendly Sherwin Index tested faster than the the 32 bit unfriendly Magic Multiplication of Gerd Isenberg. Gerd's Magic Multiplication 32 bit (friendly) version did as well or better than Lasse/Pradu. I claim that the Sherwin Index was even more 32 bit unfriendly than Gerd's 32 bit unfriendly version, which would make the 32 bit friendly version of the Sherwin index (had it been tested)the fastest of the three on 32 bit machines.
The Sherwin Index above was done using the Sherwin Index 1 in wich 6 lookups were used to form the index for the bishop and 8 lookups for the rook (sounds strange, dosn't it). Sherwin Index 2 shifts the bits down to two lookups for the bishop and two for the rook.
So, I just think that the Sherwin Index 1 & 2 deserve a closer look before passing jugement. BTW I did not name it the "Sherwin Index'--Harald did.