Winboard Forum

by **Gerd Isenberg** » 17 Dec 2006, 18:55

It might be about some noise, but i like the idea to get all stuff per square from one instead of four chachelines. The other (minor) point is that for a price of a base address calculation, further six instructions with memory operand are 3-4 bytes shorter opcodes. I even put the masks inside the struct, 24 bytes per entry.

Code: Select all: struct SMagic { unsigned int bishopMaskLo; unsigned int bishopMaskHi; unsigned int bishopMagicLo; unsigned int bishopMagicHi; unsigned int bishopShift; unsigned int*bPointer; } sm[64]; U64 Bmagic(unsigned int square) { unsigned int lo = (int) Occupied; unsigned int hi = (int)(Occupied>>32); lo &= sm[square].bishopMaskLo; hi &= sm[square].bishopMaskHi; lo *= sm[square].bishopMagicLo; hi *= sm[square].bishopMagicHi; return *(sm[square].bPointer + ((lo ^ hi) >> sm[square].bishopShift)); }

with following intended assembly in mind:

Code: Select all: lea esi, [2*esi + esi] ; 3*square lea esi, sm[8*esi] ; sm+24*square mov eax, dword ptr [occupied] mov edx, dword ptr [occupied+4] and eax, [esi].bishopMaskLo and edx, [esi].bishopMaskHi imul eax, [esi].bishopMagicLo imul edx, [esi].bishopMagicHi mov ecx, [esi].bishopShift xor edx, eax mov esi, [esi].bPointer shr edx, cl mov eax, [esi+8*edx] ; get the final attack board mov edx, [esi+8*edx+4]

by **Pradu** » 17 Dec 2006, 22:29

Tord Romstad wrote:Generating the keys for 32-bit multiplications is even easier than for 64-bit multiplications. I modified my simple magic number generator to be able to generate both (just change the #define at the top). With 64-bit multiplications, generating all rook (10, 11, 12-bit) and bishop (5, 6, 7, 9-bit) numbers for the whole board takes about 10 seconds on my computer. With 32-bit multiplications, it takes only 2 seconds. The code is totally unoptimised, and it wouldn't suprise me at all if it is possible to improve the speed by a factor of 10. With just a little effort, it should be possible to calculate the magic numbers during program initialisation instead of including them as a constant array in the source code.

If somebody would like to use my code to hunt for better magics (like 11-bit rook magics for the corners, or 10-bit rook magics for the edges), you are of course more than welcome to do so, but I think the chances of success would be better by using a more intelligent approach than mine (my program just tries random numbers until it finds one which works).

Tord

I have a rather fast parallel magic generator with enhancements to get rid of redundant bits-both lower and higher and with occupancies ordered in such a way that a bad magics will be detected quicker, but when someone else has already done the hard work generating the magics I don't see why one must do the same work again. :mrgreen:

A distributed magic project is a good idea but I think there are many many more enhancmenets that can be done to magic generation before we go overkill and go distributed. So perhaps we should wait a bit longer for some new techniques to be developed and implemented. The toughest problem is predicting how the carry will act to change index mappings of a particular bit to minimize the number of bits beyond just 1-1 bit to index mapping, perhaps an analysis of Grant's 11 bit rook keys will give some insight. A most promising technique is a way making sure of the uniqueness of index mappings and all possible combinations and going backwards from a good index mapping to the magic. We'll see if it works out.

by **Michael Sherwin** » 18 Dec 2006, 01:49

It might be about some noise

Hmmm, what a strange comment to make? If I were a paranoid person, I might think that that comment was directed at me for the small amount of clammor that I made a few post back. Well since I am not a paranoid person I really do not think that at all! :twisted:

However, I am wasting much valuable time here rather than doing what I need to get done, so I plan on being very, very quiet here. I hope that will not break anyones heart! :wink:

by **Gerd Isenberg** » 18 Dec 2006, 07:07

Michael Sherwin wrote:
It might be about some noise

Hmmm, what a strange comment to make? If I were a paranoid person, I might think that that comment was directed at me for the small amount of clammor that I made a few post back. Well since I am not a paranoid person I really do not think that at all! However, I am wasting much valuable time here rather than doing what I need to get done, so I plan on being very, very quiet here. I hope that will not break anyones heart!

Noisy was meant in the sense that those changes may behave different, dependent on on lot of other isssues. Sometimes they slow you down, sometimes they speed you up. If your L1 is not polluted, it probably doesn't matter whether you get some data from one or four cachelines. There is the additional cost to delegate agu-operations aka scaling the address operand [indexreg*1,2,4,8] to an additional alu-operation, otoh smaller code size may win a cycle up and then...

by **smcracraft** » 24 Dec 2006, 19:50

Does anyone have a copy of this revision they can send me?

I am currently using Pradu's checksummed as:

$ sum magicmoves.c magicmoves.h
63614 20 magicmoves.c
20393 11 magicmoves.h

and byte-sized as

$ ls -l magic*
-rw-r--r-- 1 scracra mkgroup 20459 Dec 13 08:59 magicmoves.c
-rw-r--r-- 1 scracra mkgroup 10726 Dec 13 08:59 magicmoves.h

I am interested in trying out potentially faster versions.

Stuart

by **Grant Osborne** » 26 Dec 2006, 15:30

Hi all

I've managed to reduce the table sizes to under 818K by finding better magics.
So here's the new keys and some code.

Code: Select all: U64 Bdb[4909]; U64 Rdb[99768]; const U64 bishopMagic[64] = { 0x080600D0080E5F02, 0x4732008400AF0FD2, 0x1298420200840103, 0x0000000000082481, 0x0000000000021210, 0xF7D118CCA75DCB7F, 0xB18910070003FF48, 0x442425100A51F157, 0x01A102140200420F, 0xA080928714954FAC, 0x95044F804D4D84C7, 0xDE983BDD629E420B, 0x9B080F2020001245, 0xC2E8076ABDFB746B, 0x8C0388A586C0F533, 0xC6224289FCBCD2A1, 0xF4EBD610AE88F586, 0xA539F530EF057C82, 0xB634161080018408, 0x2502300048022104, 0x0231000000022404, 0x4CD529F458840C22, 0x0151E44209184002, 0xA0D4EA5220102112, 0x29422204008B4210, 0x1091B90048016862, 0x0890684020024050, 0x8504020280100C80, 0x0308104208210040, 0x48818096F4022418, 0x2129F42C80108098, 0x009509F100D0B027, 0x80220B0212022504, 0x2062E50F00408215, 0x0010402F20402498, 0x2014404088148808, 0x0020200E0084008A, 0x2CAA207000416121, 0x0044A4EE00E600A1, 0x0031189A1010814A, 0x890604183F7A33CF, 0x4443220486820301, 0x020004431CF084C4, 0x422060404E200061, 0x010020B200488821, 0x102001008B1405D8, 0xE24008800C0C4188, 0xF240118110240084, 0xB433122800008421, 0x091A2C4000AB0041, 0xD704009E238B7C55, 0x0605C08000410501, 0x056102702101005E, 0x5BD5C00F4202B046, 0x840BB49112E20421, 0x0B2081001902024A, 0x21D6F2C040020121, 0x07A333113508420D, 0x006C0B6048220315, 0x06104C0800000000, 0x3097434700000000, 0x3F34F24A15A98871, 0x8EA38EC007404872, 0x1FFF966AC3E87030 }; const U64 bMask[64] = { 0x0040201008040200, 0x0000402010080400, 0x0000004020100A00, 0x0000000040221400, 0x0000000002442800, 0x0000000204085000, 0x0000020408102000, 0x0002040810204000, 0x0020100804020000, 0x0040201008040000, 0x00004020100A0000, 0x0000004022140000, 0x0000000244280000, 0x0000020408500000, 0x0002040810200000, 0x0004081020400000, 0x0010080402000200, 0x0020100804000400, 0x004020100A000A00, 0x0000402214001400, 0x0000024428002800, 0x0002040850005000, 0x0004081020002000, 0x0008102040004000, 0x0008040200020400, 0x0010080400040800, 0x0020100A000A1000, 0x0040221400142200, 0x0002442800284400, 0x0004085000500800, 0x0008102000201000, 0x0010204000402000, 0x0004020002040800, 0x0008040004081000, 0x00100A000A102000, 0x0022140014224000, 0x0044280028440200, 0x0008500050080400, 0x0010200020100800, 0x0020400040201000, 0x0002000204081000, 0x0004000408102000, 0x000A000A10204000, 0x0014001422400000, 0x0028002844020000, 0x0050005008040200, 0x0020002010080400, 0x0040004020100800, 0x0000020408102000, 0x0000040810204000, 0x00000A1020400000, 0x0000142240000000, 0x0000284402000000, 0x0000500804020000, 0x0000201008040200, 0x0000402010080400, 0x0002040810204000, 0x0004081020400000, 0x000A102040000000, 0x0014224000000000, 0x0028440200000000, 0x0050080402000000, 0x0020100804020000, 0x0040201008040200 }; const U64 rookMagic[64] = { 0xA001140800200860, 0x5008001000103004, 0x0280500000800E30, 0x7080300C11011010, 0x0041260080412802, 0x287001888040020C, 0x43400E0044080111, 0xA080003F00200145, 0x82C00024A0020102, 0x2008040008003060, 0x238C005800004013, 0x209400400000440C, 0x3A00408185004206, 0x2020850040002103, 0xC003006811004181, 0x0481401E00001023, 0x080304EA01311002, 0x1E00214A40004041, 0x844E0028A0808401, 0x0804403208220204, 0x00700B4001A40101, 0x4780104C80408202, 0x404B000B20008041, 0x900F408040008101, 0x0080005054902D41, 0x48841520020040B1, 0x620A40F260072082, 0x0F4A200CC2001602, 0x0A160004A4300A22, 0x80840412104C0B01, 0x0012828310100B04, 0x010160940100006E, 0x1010204B4412A008, 0x390040A808080029, 0x8C010040C210A022, 0x8808008882B12038, 0x11000B0405391128, 0x0100080282110085, 0x810104210042001B, 0x10200840C0316601, 0x91B0040001800248, 0x09E8101004004006, 0x22004040040C0012, 0x2900400B10110046, 0xC180800E18F90006, 0x022080020018E10C, 0x8810104400002E11, 0x002201010200E005, 0x9010002000C200B0, 0xE038002010880020, 0x180A00208A040068, 0x4C0400205000F402, 0x4702002002807A01, 0x030100200000D825, 0x0301804043084001, 0xE41108040800A007, 0x001B1041000028C0, 0x91004182C4042813, 0x000F416A04001022, 0x000418610009C210, 0x1003090BC0000408, 0x840228012021F004, 0x40022C01400011E2, 0x090026AF00200061 }; const U64 rMask[64] = { 0x000101010101017E, 0x000202020202027C, 0x000404040404047A, 0x0008080808080876, 0x001010101010106E, 0x002020202020205E, 0x004040404040403E, 0x008080808080807E, 0x0001010101017E00, 0x0002020202027C00, 0x0004040404047A00, 0x0008080808087600, 0x0010101010106E00, 0x0020202020205E00, 0x0040404040403E00, 0x0080808080807E00, 0x00010101017E0100, 0x00020202027C0200, 0x00040404047A0400, 0x0008080808760800, 0x00101010106E1000, 0x00202020205E2000, 0x00404040403E4000, 0x00808080807E8000, 0x000101017E010100, 0x000202027C020200, 0x000404047A040400, 0x0008080876080800, 0x001010106E101000, 0x002020205E202000, 0x004040403E404000, 0x008080807E808000, 0x0001017E01010100, 0x0002027C02020200, 0x0004047A04040400, 0x0008087608080800, 0x0010106E10101000, 0x0020205E20202000, 0x0040403E40404000, 0x0080807E80808000, 0x00017E0101010100, 0x00027C0202020200, 0x00047A0404040400, 0x0008760808080800, 0x00106E1010101000, 0x00205E2020202000, 0x00403E4040404000, 0x00807E8080808000, 0x007E010101010100, 0x007C020202020200, 0x007A040404040400, 0x0076080808080800, 0x006E101010101000, 0x005E202020202000, 0x003E404040404000, 0x007E808080808000, 0x7E01010101010100, 0x7C02020202020200, 0x7A04040404040400, 0x7608080808080800, 0x6E10101010101000, 0x5E20202020202000, 0x3E40404040404000, 0x7E80808080808000 }; U64* bIndecies[64] = { Bdb+4781, Bdb+2503, Bdb+236, Bdb+856, Bdb+1233, Bdb+1611, Bdb+4637, Bdb+4845, Bdb+2447, Bdb+2519, Bdb+264, Bdb+888, Bdb+1265, Bdb+1627, Bdb+4653, Bdb+4725, Bdb+0, Bdb+116, Bdb+292, Bdb+916, Bdb+1295, Bdb+1655, Bdb+2206, Bdb+2326, Bdb+28, Bdb+144, Bdb+418, Bdb+2591, Bdb+3614, Bdb+1781, Bdb+2236, Bdb+2356, Bdb+58, Bdb+174, Bdb+544, Bdb+3102, Bdb+4125, Bdb+1907, Bdb+2264, Bdb+2384, Bdb+88, Bdb+204, Bdb+671, Bdb+1044, Bdb+1423, Bdb+2034, Bdb+2294, Bdb+2415, Bdb+2475, Bdb+2547, Bdb+799, Bdb+1172, Bdb+1551, Bdb+2162, Bdb+4681, Bdb+4753, Bdb+4813, Bdb+2575, Bdb+827, Bdb+1201, Bdb+1580, Bdb+2190, Bdb+4709, Bdb+4877 }; U64* rIndecies[64] = { Rdb+83947, Rdb+71951, Rdb+35958, Rdb+41914, Rdb+45998, Rdb+50085, Rdb+75783, Rdb+91619, Rdb+67985, Rdb+32142, Rdb+38002, Rdb+10200, Rdb+14007, Rdb+52125, Rdb+55957, Rdb+79867, Rdb+23953, Rdb+33038, Rdb+6137, Rdb+11096, Rdb+14903, Rdb+17815, Rdb+56853, Rdb+59794, Rdb+26000, Rdb+4094, Rdb+7160, Rdb+0, Rdb+2047, Rdb+18838, Rdb+21907, Rdb+61842, Rdb+28048, Rdb+5117, Rdb+8184, Rdb+1023, Rdb+3070, Rdb+19861, Rdb+22930, Rdb+63890, Rdb+30096, Rdb+34054, Rdb+9208, Rdb+12119, Rdb+15927, Rdb+20885, Rdb+57876, Rdb+65938, Rdb+70031, Rdb+35062, Rdb+38978, Rdb+13111, Rdb+16919, Rdb+53021, Rdb+58898, Rdb+81903, Rdb+87531, Rdb+73743, Rdb+39874, Rdb+43958, Rdb+48045, Rdb+53917, Rdb+77827, Rdb+95673 }; const unsigned int bishopShift[64] = { 27, 28, 27, 27, 27, 28, 28, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 25, 25, 25, 25, 27, 27, 27, 27, 25, 23, 23, 25, 27, 27, 27, 27, 25, 23, 23, 25, 27, 27, 27, 27, 25, 25, 25, 25, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 28, 27, 27, 27, 28, 28, 27 }; const unsigned int rookShift[64] = { 20, 21, 21, 21, 21, 21, 21, 20, 21, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 22, 21, 20, 21, 21, 21, 21, 21, 21, 20 }; void setup(void) { int aa, bb, squares[64]; unsigned int hi, lo; U64 bit, temp, pattern; int bitpos[64] = { 63, 0, 58, 1, 59, 47, 53, 2, 60, 39, 48, 27, 54, 33, 42, 3, 61, 51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4, 62, 57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21, 56, 45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5 }; for (aa = 0; aa < 64; aa++) { bb = 0; temp = bMask[aa]; while (temp) { bit = temp & -temp; squares[bb++] = bitpos[(bit * 0x07EDD5E59A4E28C2) >> 58]; temp ^= bit; } for (temp = 0; temp < (((U64)(1)) << bb); temp++) { pattern = initOccupation(squares, bb, temp); lo = (int)(pattern) * (int)(bishopMagic[aa]); hi = (int)(pattern >> 32) * (int)(bishopMagic[aa] >> 32); *(bIndecies[aa] + ((lo ^ hi) >> bishopShift[aa])) = initBmoves(aa, pattern); } } for (aa = 0; aa < 64; aa++) { bb = 0; temp = rMask[aa]; while (temp) { bit = temp & -temp; squares[bb++] = bitpos[(bit * 0x07EDD5E59A4E28C2) >> 58]; temp ^= bit; } for (temp = 0; temp < (((U64)(1)) << bb); temp++) { pattern = initOccupation(squares, bb, temp); lo = (int)(pattern) * (int)(rookMagic[aa]); hi = (int)(pattern >> 32) * (int)(rookMagic[aa] >> 32); *(rIndecies[aa] + ((lo ^ hi) >> rookShift[aa])) = initRmoves(aa, pattern); } } } U64 Bmagic(unsigned int square) { U64 occ; unsigned int hi, lo; occ = Occupied & bMask[square]; lo = (unsigned int)(occ) * (unsigned int)(bishopMagic[square]); hi = (unsigned int)(occ >> 32) * (unsigned int)(bishopMagic[square] >> 32); return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square])); } U64 Rmagic(unsigned int square) { U64 occ; unsigned int hi, lo; occ = Occupied & rMask[square]; lo = (unsigned int)(occ) * (unsigned int)(rookMagic[square]); hi = (unsigned int)(occ >> 32) * (unsigned int)(rookMagic[square] >> 32); return *(rIndecies[square] + ((lo ^ hi) >> rookShift[square])); } U64 initOccupation(const int* squares, const int numSquares, const U64 linocc) { int aa; U64 temp = 0; for (aa = 0; aa < numSquares; aa++) if (linocc & (((U64)(1)) << aa)) temp |= (((U64)(1)) << squares[aa]); return temp; } U64 initRmoves(const int square, const U64 occ) { U64 temp, bit, rowbits; temp = 0; rowbits = (((U64)0xFF) << (8 * (square / 8))); bit = (((U64)(1)) << square); do { bit <<= 8; temp |= bit; } while (bit && !(bit & occ)); bit = (((U64)(1)) << square); do { bit >>= 8; temp |= bit; } while (bit && !(bit & occ)); bit = (((U64)(1)) << square); do { bit <<= 1; if (bit & rowbits) temp |= bit; else break; } while (!(bit & occ)); bit = (((U64)(1)) << square); do { bit >>= 1; if (bit & rowbits) temp |= bit; else break; } while (!(bit & occ)); return temp; } U64 initBmoves(const int square, const U64 occ) { U64 temp, bit, bit2, rowbits; temp = 0; rowbits = (((U64)0xFF) << (8 * (square / 8))); bit = (((U64)(1)) << square); bit2 = bit; do { bit <<= 8 - 1; bit2 >>= 1; if (bit2 & rowbits) temp |= bit; else break; } while (bit && !(bit & occ)); bit = (((U64)(1)) << square); bit2 = bit; do { bit <<= 8 + 1; bit2 <<= 1; if (bit2 & rowbits) temp |= bit; else break; } while (bit && !(bit & occ)); bit = (((U64)(1)) << square); bit2 = bit; do { bit >>= 8 - 1; bit2 <<= 1; if (bit2 & rowbits) temp |= bit; else break; } while (bit && !(bit & occ)); bit = (((U64)(1)) << square); bit2 = bit; do { bit >>= 8+1; bit2 >>= 1; if (bit2 & rowbits) temp |= bit; else break; } while (bit && !(bit & occ)); return temp; }

Grant

by **smcracraft** » 27 Dec 2006, 02:22

Hi - I integrated this in as GRANTMAGIC #ifdef to my code
and tried it out. It failed to pass the 126 position qamg+kiwipete
depth 1 test (I usually test to 6).

Also, it resulted in a Black Bishop at Black's e8 initial king
location after an 8 ply search.

Just to be sure, I commented out GRANTMAGIC and left
regular MAGIC (which is Pradu's code I referenced in an earlier
message), and it passed both.

Something is amiss.

My integration is solid and I do not believe that is what is
happening.

by **Grant Osborne** » 27 Dec 2006, 12:34

Hi Stuart

Which of the tests did it fail? I would need a specific position so I can investigate.

Grant

by **smcracraft** » 27 Dec 2006, 16:03

Here is the run. Note e8 where
a BK is before the run and where
the BB is after the run:

! i saw: bd
BR BN BB BQ BK BB BN BR
BP BP BP BP BP BP BP BP
-- ** -- ** -- ** -- **
** -- ** -- ** -- ** --
-- ** -- ** -- ** -- **
** -- ** -- ** -- ** --
WP WP WP WP WP WP WP WP
WR WN WB WQ WK WB WN WR

white to move, castle = KQkq, nominal depth = 99, hashkey=62c10916a2b7f5d, phashkey=76a4bd4cd57cf4c0
! i saw: depth 8
i scanned maxdepth = 8
! i saw: move
maxdepth = 8, maxtime = infinite
1. Pa2a3* Pb2b3* Pc2c3* Pd2d3* Pe2e3* Pd2d4* Pe2e4* Nb1c3* ; sc=199 ti=0.00 no=55 pv=Nb1c3
2. Nb1c3* ; sc=0 ti=0.01 no=282 pv=Nb1c3 Nb8c6
3. Nb1c3* ; sc=199 ti=0.01 no=571 pv=Nb1c3 Nb8c6 Ng1f3
4. Nb1c3* ; sc=0 ti=0.06 no=6719 pv=Nb1c3 Nb8c6 Ng1f3 Ng8f6
5. Nb1c3* Ng1f3* ; sc=114 ti=0.27 no=30492 pv=Ng1f3 Ng8f6 Nb1c3 Nb8c6 Pe2e4
6. Ng1f3* ; sc=27 ti=1.29 no=124464 pv=Ng1f3 Nb8c6 Nb1c3 Ng8f6 Pd2d4 Nc6b4
7. Ng1f3* ; sc=137 ti=3.32 no=324684 pv=Ng1f3 Ng8f6 Pd2d4 Pd7d6 Nb1c3 Nb8c6 Pd4d5
8. Ng1f3* Nb1c3* ; sc=71 ti=19.65 no=1829832 pv=Nb1c3 Nb8c6 Pd2d4 Pd7d5 Ng1f3 Ng8f6 Bc1f4 Bc8f5
Ha=36.73%/92.79% NPS=93036 Sc=71 Ti=19.6680 No=1829832 pv=Nb1c3 Nb8c6 Pd2d4 Pd7d5 Ng1f3 Ng8f6 Bc1f4 Bc8f5
Ext: Check=671 NullThreat=0 Promote=6 Pawn7th=0 Recapture=571 Single=32

I moved: Nb1c3
BR BN BB BQ BB BB BN BR
BP BP BP BP BP BP BP BP
-- ** -- ** -- ** -- **
** -- ** -- ** -- ** --
-- ** -- ** -- ** -- **
** -- WN -- ** -- ** --
WP WP WP WP WP WP WP WP
WR -- WB WQ WK WB WN WR

black to move, castle = KQkq, nominal depth = 8, hashkey=6fa63784cdc49ff6, phashkey=41d4fcfe2211754f
! i saw: quit
%

--Stuart

by **Grant Osborne** » 27 Dec 2006, 16:55

Stuart

I hand typed each of your 126 test positions into my program and each one tested O.K.

Can you do a perft to depth 8 and arrive at 84998978956 nodes?

Grant

by **smcracraft** » 06 Jan 2007, 01:36

It's a little deep for my program. I don't hash in emp/perft.
Depth is limited as a result.

Here is a result that failed at emp(4).

! depth 8
i saw: depth 8
i scanned maxdepth = 8
! emp
i saw: emp
emp(1) totalnodes = 20 expected = 20 off-by-= 0.0000% OK
emp(2) totalnodes = 400 expected = 400 off-by-= 0.0000% OK
emp(3) totalnodes = 8902 expected = 8902 off-by-= 0.0000% OK
emp(4) totalnodes = 197732 expected = 197281 off-by-= 2.2809% !BAD!
emp(5) totalnodes = 4896339 expected = 4865609 off-by-= 6.2761% !BAD!

If you'd like a binary of the program for the PC, I can send it
along with its single library.

Stuart

by **Tony van Roon-Werten** » 06 Jan 2007, 10:43

Hi Stuart,

just for some nitpicking, your percentages are off by a factor 10

Cheers,

Tony

by **smcracraft** » 08 Jan 2007, 03:38

Tony van Roon-Werten wrote:Hi Stuart,

just for some nitpicking, your percentages are off by a factor 10

Cheers,

Tony

Thanks - updated. Hope this is better:

! depth 4
i saw: depth 4
i scanned maxdepth = 4
! emp
i saw: emp
emp(1) totalnodes = 20 expected = 20 off-by-= 0.0000% OK
emp(2) totalnodes = 400 expected = 400 off-by-= 0.0000% OK
emp(3) totalnodes = 8902 expected = 8902 off-by-= 0.0000% OK
emp(4) totalnodes = 197732 expected = 197281 off-by-= 0.2281% !BAD!

by **Grant Osborne** » 08 Jan 2007, 14:18

Stuart

If you compare GRANTMAGIC with MAGIC in your test, you should be able to identify at which node the two differ and therefore the move that is causing a problem

Grant

by **Onno Garms** » 15 Mar 2009, 02:16

Grant Osborne wrote:Hi all

For those of you that are using Pradu's magic move bitboard generator (or understand how it works) and have a 32-bit computer,
I picked up on a recent post by H.G Muller and substituted the code above with:-

Code: Select all
U64 Bmagic(unsigned int square) { U64 occ = Occupied & bishopMask[square]; unsigned int lo = (int)(occ) * (int)(bishopMagic[square]); unsigned int hi = (int)(occ >> 32) * (int)(bishopMagic[square] >> 32); return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square])); } U64 Rmagic(unsigned int square) { U64 occ = Occupied & rookMask[square]; unsigned int lo = (int)(occ) * (int)(rookMagic[square]); unsigned int hi = (int)(occ >> 32) * (int)(rookMagic[square] >> 32); return *(rIndecies[square] + ((lo ^ hi) >> rookShift[square]));

This modification requires a new set of two 32-bit keys per square (or like me you could join them into one 64-bit key)

Strangely my unit tests claim that your 32 bit rook multiplier for square number 27 (d4 in my numbering, which numbers a1, b1, c1 with 0,1,2) is broken. They say that the sets d2 and d2+d7 both map to the same index. Your multiplier is 0x0F4A2006C2001602. A correct multiplier is 0x0021000800001001. Number 27 is the only broken multiplier among your 128 multipliers.

Some more remarks:

1. I think you should do the encoding into a 64 bit multiplier the other way round:
- Put the multiplier that is used for the low dword of the bitboard in the high dword of the 64 bit multiplier and
- put the multiplier that is used for the high dword of the bitboard in the low dword of the 64 multiplier
This might allow to find common multipliers for 32 and 64 bit versions.

2. For aesthetical reasons I like sparse multipliers (i.e. only few bits set) better. They allow you to understand how the action on the bitboards works and can be calculated with a pencil and a sheet of paper instead of some brute force search program.

3.

and only uses two 32-bit multiplies instead of three,

Aren't even four multiplications needed? How can you compute a 64 bit product with 3 multiplications @ 32 bit?

4.

and a 32-bit shift, and is probably more suited to a 32-bit machine.

Thank you very much for the great idea!

by **H.G.Muller** » 15 Mar 2009, 09:57

Onno Garms wrote:3.
and only uses two 32-bit multiplies instead of three,

Aren't even four multiplications needed? How can you compute a 64 bit product with 3 multiplications @ 32 bit?

That was the trick which sparked this thread: You cannot do a 64-bit multiply, with two 32-bit multiplies, but there is no need to do a 64-bit multiply, and so you don't do one.

In magic bitboards, the multiply instruction is merely abused for the purpose of gathering bits scattered over a 64-bit word into the high-order few bits. An 'incomplete' multiplication which multiplies each 32-bit half of the board with a 32-bit magic, and adds the two, performs this function just as well, although in another way (i.e. you need other magics to get the desired effect as with true multiplication).

by **Onno Garms** » 15 Mar 2009, 16:10

H.G.Muller wrote:That was the trick which sparked this thread: You cannot do a 64-bit multiply, with two 32-bit multiplies, but there is no need to do a 64-bit multiply, and so you don't do one.

Yes, I am aware that Grant's code only needs two multiplications and that that is sufficient.
I wanted to ask if the statement should not be "two 32-bit multiplies instead of four". How can a 64-bit multiplication be done with three 32-bit-multiplications in case you need a 64-bit multiplication?

Anyway,my main reason for reactivating the thread was the broken multiplier for d4. Any comments on that?

(i.e. you need other magics to get the desired effect as with true multiplication).

There are multipliers that work only for 64 bit multiplication, e.g. 0x0000080080100080 for d4.
There are multipliers that work only for 32 bit tricks.
If you swap the dwords in the multiplier as I suggested, there are also multipliers that work for both, e.g. 0x0000100100210008 for d4.

by **Zach Wegner** » 15 Mar 2009, 19:17

Onno Garms wrote:How can a 64-bit multiplication be done with three 32-bit-multiplications in case you need a 64-bit multiplication?

BB1 = (BB1H<<32)+BB1L
BB2 = (BB2H<<32)+BB2L
BB1*BB2 = BB1L*BB2L + (BB1L*BB1H<<32) + (BB1H*BB2L<<32) + (BB1H*BB2H<<64);

He was probably referring to the fact that the last term can be omitted by the compiler. Native 64 bit multiplication has a 128 bit result (and 32 has 64, etc.). But when the multiplication is being emulated, the compiler can probably see that the top 64 bits are not used and save that multiply (as well as a couple adds by not using the top 32 bits of the two middle terms).

by **Onno Garms** » 15 Mar 2009, 22:00

Zach Wegner wrote:BB1 = (BB1H<<32)+BB1L
BB2 = (BB2H<<32)+BB2L
BB1*BB2 = BB1L*BB2L + (BB1L*BB1H<<32) + (BB1H*BB2L<<32) + (BB1H*BB2H<<64);
He was probably referring to the fact that the last term can be omitted by the compiler.

Thanx. I must have been hit with chess-blindness in programming.

by **Grant Osborne** » 16 Mar 2009, 16:43

Hi Onno

Are you still having a problem with your rook multiplier for square number 27 or is it fixed now?
Since I discovered that the shift can be incorporated in the multiplier, I set about finding new magics. Square 27 is now 0xB74A200CC2001602.

Code: Select all: U64 Bmagic(unsigned int square) { U64 occ; unsigned int hi, lo; occ = Occupied & bMask[square]; lo = (unsigned int)(occ) * (unsigned int)(bishopMagic[square]); hi = (unsigned int)(occ >> 32) * (unsigned int)(bishopMagic[square] >> 32); return *(bIndecies[square] + ((lo ^ hi) >> (bishopMagic[square] >> 59))); } U64 BmagicAlt(unsigned int square, U64 occupancy) { U64 occ; unsigned int hi, lo; occ = occupancy & bMask[square]; lo = (unsigned int)(occ) * (unsigned int)(bishopMagic[square]); hi = (unsigned int)(occ >> 32) * (unsigned int)(bishopMagic[square] >> 32); return *(bIndecies[square] + ((lo ^ hi) >> (bishopMagic[square] >> 59))); }

I found that the better magics are the more dense ones for a smaller table.

In fact, I have removed quite a lot of this magic move generation stuff only using it where I have to. I found something a little bit faster.

Grant

Winboard Forum

A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Re: A Faster Magic Move Bitboard Generator?

Who is online