Page 1 of 3

Low memory usage attack bitboard generation

PostPosted: 06 Oct 2011, 17:54
by crystalclear
I think I may have a novel approach to attack bitboards.
My idea is not fully formuated and it is likely to be slow compared to existing methods like Magic Numbers.

I know magic numbers seems to be the way to go for fast generation of attack bitboards. It seems very memory intensive.

My idea came after reading a discussion of the amount of memory required by these attack bitboards.
Given that memory is cheap, I don't really see memory as a problem, unless someone tried to create queen magic
rather than using a union of rook and bishop attach bitboards. That's because the queen can move to very many squares.

Still, I was thinking about attack bitboards, their tables and hashing, and this is what I came up with.

Hashing
---------
A bishop or rook can move in 4 directions along two lines.
Queen attack bitboard are usually created as the union of rook and bishop attack bitboards.
Similarly, at the cost of some processor time rook and bishop bitboard can be created as a union of bitboards for two lines.
With good hashing, the indices for lines are likely to be at most 8 bits - not very memory intensive.
So I don't see any need to subdivide and go smaller still, although my first attempt at a chess program did/does
what I call shadow removal (regarding the active piece as a light source), on a direction by direction basis.
It calculated 4 times, N,S,E,W for a rook; this proposes we calculate NS together, and then EW.

If you logically AND a piece occupancy bitboard with a bitboard for the line, you can create a distrubuted 8 bit bit pattern.
This bitpattern then needs to be hashed into a small number of consecutive bits if it is to be easily used to index an array.
This is where magic numbers come in useful, but they are not part of this discussion - they not only permute the order of the bits in
an index, but also muddle values. This is appears to be one reason a lot of memory is required with the magic number approach.

So for hashing, I decided to use modular arithmetic. I don't propose to discuss it in detail, but basically, all I want
is a method to compress a bit pattern like 'a00000000b00000000c00000000d00000000e00000000f00000000g00000000h'
into a bit pattern like 'abcdefgh'.
Now that is precisly what modular arithmetic does. Working modulo 255, 256=1.
So bits could be 1 bit apart (rook moves) or an extra 8 bits (multiple of 256, ie a multiple of 1 modulo 255) forwards or backward (bishop moves)
and they still appear as adjacent when hashed using modulo arithmetic. [The precice choice of modulus chosen may depend on the board layout used etc.]

Anyway, a relatively simple calculation will compress the piece line occupancy bits into an 8 bit value without fundametally changing the bit order or muddling the bit values: it should simply close up the bits.

Recreating a bitboard
-------------------------
Scattering the bits abcdefgh back along the line '1000000001000000001000000001000000001000000001000000001000000001'
is just a simple matter of ANDing a multiple of the bitpattern and the bits for the line, eg :-

Code: Select all
1000000001000000001000000001000000001000000001000000001000000001&
abcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefgh =
a00000000b00000000c00000000d00000000e00000000f00000000g00000000h


Motion and Attack
---------------------
There are 256 ways pieces can be distributed along a line. (Treat short bishop digonals as longer lines with some empty locations.)
There seem to be about 37 motion combinations and about 30 attack possibilities. (These are shown below.)
And there are 8 ways the piece under consideration (doing the moving or attacking)
can be distributed along a line.

So to generate motion and attack bitboards, I initially thought I might need an array of size [256][8]
for the various locations of the moving piece and scattering of the other pieces that may or may not obstruct its path.

The MOTION array contains a string of contiguous 1s indicating the reachable squares for the active piece moving backwards or forwards along the line.
The ATTACK array typically contains a pair of bits indicating the position of the first piece that would be hit by the active piece moving either backwards or forwards from its current location. It may contain less bits, eg a bishop at H8 will never hit another piece by moving in the direction SE/NW.

Code: Select all
int motion[37] =
{   0, //  0 00000000
    1, //  1 00000001
    2, //  2 00000010
    3, //  3 00000011
    4, //  4 00000100
    6, //  5 00000110
    7, //  6 00000111
    8, //  7 00001000
   12, //  8 00001100
   14, //  9 00001110
   15, // 10 00001111
   16, // 11 00010000
   24, // 12 00011000
   28, // 13 00011100
   30, // 14 00011110
   31, // 15 00011111
   32, // 16 00100000
   48, // 17 00110000
   56, // 18 00111000
   60, // 19 00111100
   62, // 20 00111110
   63, // 21 00111111
   64, // 22 01000000
   96, // 23 01100000
  112, // 24 01110000
  120, // 25 01111000
  124, // 26 01111100
  126, // 27 01111110
  127, // 28 01111111
  128, // 29 10000000
  192, // 30 11000000
  224, // 31 11100000
  240, // 32 11110000
  248, // 33 11111000
  252, // 34 11111100
  254, // 35 11111110
  255  // 36 11111111
};
int attacks[30] =
{   0, //  0 00000000
    1, //  1 00000001
    2, //  2 00000010
    4, //  3 00000100
    5, //  4 00000101
    8, //  5 00001000
   10, //  6 00001010
    9, //  7 00001001
   16, //  8 00010000
   20, //  9 00010100
   18, // 10 00010010
   17, // 11 00010001
   32, // 12 00100000
   40, // 13 00101000
   36, // 14 00100100
   34, // 15 00100010
   33, // 16 00100001
   64, // 17 01000000
   80, // 18 01010000
   72, // 19 01001000
   68, // 20 01000100
   66, // 21 01000010
   65, // 22 01000001
  128, // 23 10000000
  160, // 24 10100000
  144, // 25 10010000
  136, // 26 10001000
  132, // 27 10000100
  130, // 28 10000010
  129  // 29 10000001
};


Generating Attack Data from Line Data
----------------------------------------------
It then occured to me that 30 (37 is too) is a small number, less than 64.
This means the possible attack arrays for a piece position along the line (eg A-H) can be bitmapped,
and the possible attack arrays for pieces scattered along a line (0-255) can be bitmapped.
What I mean here, is bit 0 set if motion[0] is possible, and bit 1 set if motion[1] is possible,
bit 2 set if motion[2] is possible, etc.
So rather than require an array [256][8], we just need an array [256] and another [8]
and we AND their contents. The first array with 256 elements tells us which attack are possible for a particular piece occupancy of a line
and the second array tells us the attacks possible for an active piece at a given location in its line.
The result of the AND operation indicates possible attacks
with pieces scattered along the line a certain way and with the active piece at the specified point
along the line.

Summary
-----------
I feel I have found a way to generate attack or motion bitboards using a handful of arrays, none of which contain more than 256 elements.

Speed and applications
---------------------------
I am not thinking that this will be useful for a normal chess playing engine, so please don't tell me that magic is faster, better, etc.

I would like an application on a smartphone that understands chess and yet CANNOT play the game.
I play games over the board and sometimes want to write down the moves. I'd like to have a visual chess board on my phone
and to drag and drop the pieces as I move them, rather as if I were playing chess online, but having to move both white and black pieces.
At the end of the game, it'd be great if the phone could then store the game in a file in portable game notation (PGN) format.
The program can be as slow as it likes - its aim would be simply to register the moves played - an electronic version of
a game book or score sheet. It must NOT be able to play chess, to avoid possible accusations of cheating.
That is the sort of application that might benefit from a low-memory algorithm for generating attack bitboards.

I have glossed over many things here, for these reasons.
1. I haven't actually done it yet :shock: I may add sample code for the [256] and [8] arrays later.
2. There are some intricacies to modular arithmetic bit hashing best discussed elsewhere.

Re: Low memory usage attack bitboard generation

PostPosted: 07 Oct 2011, 07:08
by Aleks Peshkov
There are methods without any memory arrays at all. Take a look on http://chessprogramming.wikispaces.com/Kogge-Stone+Algorithm.

Re: Low memory usage attack bitboard generation

PostPosted: 07 Oct 2011, 12:58
by crystalclear
Aleks Peshkov wrote:There are methods without any memory arrays at all. Take a look on http://chessprogramming.wikispaces.com/Kogge-Stone+Algorithm.


Okay, thanks.
I'll take a look.

Re: Low memory usage attack bitboard generation

PostPosted: 07 Oct 2011, 21:34
by Gerd Isenberg
Interesting approach and good explanation, thanks. Casting out was already described by Fenner and Levenne in 2008, see Congruent Modulo Bitboards in cpw. Similar issue with bitscan hashing by mod 65 versus De Bruijn multiplication.

Gerd

Re: Low memory usage attack bitboard generation

PostPosted: 09 Oct 2011, 19:21
by crystalclear


I had a quick look, but haven't understood it yet.

Fenner and Levene use masked line modulo 514 for the diagonals, modulo 257 for the anti-diagonals and mod 258 for files, to calculate the occupied index, but they didn't consider the inner six bits for a denser lookup.


I don't feel comfortable yet using a modulus that isn't coprime with 2, eg 514 or 258, and I'd rather use a power of 2 minus 1 as the modulus so that the power of two is equivalent to 1 modulo my chosen number. That way, I can partly visualise that a bit somewhere is equivalent to another bit being set somewhere else.

Code: Select all
     1 |  2 |  4 |  8 |  16 | 32 | 64 | 128 |
     1 |  2 |  4 |  8 |  16 | 32 | 64 | 128 |
     1 |  2 |  4 |  8 |  16 | 32 | 64 | 128 |
     1 |  2 |  4 |  8 |  16 | 32 | 64 | 128 |
     1 |  2 |  4 |  8 |  16 | 32 | 64 | 128 |
     1 |  2 |  4 |  8 |  16 | 32 | 64 | 128 |
     1 |  2 |  4 |  8 |  16 | 32 | 64 | 128 |
     1 |  2 |  4 |  8 |  16 | 32 | 64 | 128 |

Modulo 255, 256=1 and so the powers of two layed out in an 8x8 pattern look like the picture above.

So taking bits from any row from that should give me an an index that I can use.
For bishops I can just take a diagonal, slopping upwards or downwards

Code: Select all
     - |  - |  - |  - |  -  | 32 |  - |  -  |
     - |  - |  - |  - |  16 | -  |  - |  -  |
     - |  - |  - |  8 |  -  | -  |  - |  -  |
     - |  - |  4 |  - |  -  | -  |  - |  -  |
     - |  2 |  - |  - |  -  | -  |  - |  -  |
     1 |  - |  - |  - |  -  | -  |  - |  -  |
     - |  - |  - |  - |  -  | -  |  - |  -  |
     - |  - |  - |  - |  -  | -  |  - |  -  |

So I think I can do most of what I want calculating molulo 255.
However I cannot extract column indicies that way.

Code: Select all
     | 1  | 2  | 4  | 8  | 16 | 32 | 64 | 128
     | 256| 1  | 2  | 4  | 8  | 16 | 32 | 64
     | 128| 256| 1  | 2  | 4  | 8  | 16 | 32
     | 64 | 128| 256| 1  | 2  | 4  | 8  | 16
     | 32 | 64 | 128| 256| 1  | 2  | 4  | 8
     | 16 | 32 | 64 | 128| 256| 1  | 2  | 4
     | 8  | 16 | 32 | 64 | 128| 256| 1  | 2
     | 4  | 8  | 16 | 32 | 64 | 128| 256| 1

These are values for the powers of two modulo 511.
Looking at the last column, I can make a nice index using the last column.
Bit values for the last column will be preserved, they will be in (reverse?) order, and
a value of 1 representing the end of a line.

To get a completely logical index from another column, I will have to multiply by a constant.
For example for the 4th column, I can multiply by 16 before doing my MOD 511 calculation.

Code: Select all
     | 16 | 32 | 64 | 128| 256| 1  | 2  | 4
     | 8  | 16 | 32 | 64 | 128| 256| 1  | 2
     | 4  | 8  | 16 | 32 | 64 | 128| 256| 1
     | 2  | 4  | 8  | 16 | 32 | 64 | 128| 256
     | 1  | 2  | 4  | 8  | 16 | 32 | 64 | 128
     | 256| 1  | 2  | 4  | 8  | 16 | 32 | 64
     | 128| 256| 1  | 2  | 4  | 8  | 16 | 32
     | 64 | 128| 256| 1  | 2  | 4  | 8  | 16

My index will always be a bit pattern that actually represents the layout of the pieces.
So if some other system of index calculation comes along that I find better, I should be able to swap
without it affecting the rest of the attack bitboard stuff.

I write a few lines of BASIC to demonstate how I intend to generate my indicies.
The example uses a 3x3 array of decimal digits instead of an 8x8 array of binary digits, but the principle is the same.
I populated my example 3x3 array with my best recollection (probably wrong) of the digits of Pi.

Code: Select all
      *SPOOL EXAMPLE.TXT
      PRINT 314
      PRINT 159
      PRINT 268
     
      PRINT 268159314 " IS THE INTEGER REPRESENTATION"
     
      PRINT "EXTRACT DIAGONALS"
     
      PRINT 200050004 " MOD " 999 " IS " 200050004 MOD 999
      PRINT 008050300 " MOD " 999 " IS " 008050300 MOD 999
     
      PRINT 060100000 " MOD " 999 " IS " 060100000 MOD 999
     
      PRINT "EXTRACT ROWS"
     
     
      PRINT       314 " MOD " 999 " IS "       314 MOD 999
      PRINT    159000 " MOD " 999 " IS "    159000 MOD 999
      PRINT 268000000 " MOD " 999 " IS " 268000000 MOD 999
     
      PRINT "EXTRACT COLUMNS"
     
      PRINT 008009004 " *100 MOD " 9999 " IS " 008009004 *100 MOD 9999
      PRINT 060050010 " * 10 MOD " 9999 " IS " 060050010 * 10 MOD 9999
      PRINT 200100300 " *  1 MOD " 9999 " IS " 200100300 *  1 MOD 9999
      *SPOOL



This created the following output ...
Code: Select all
       314
       159
       268
 268159314 IS THE INTEGER REPRESENTATION
EXTRACT DIAGONALS
 200050004 MOD        999 IS        254
   8050300 MOD        999 IS        358
  60100000 MOD        999 IS        160
EXTRACT ROWS
       314 MOD        999 IS        314
    159000 MOD        999 IS        159
 268000000 MOD        999 IS        268
EXTRACT COLUMNS
   8009004 *100 MOD       9999 IS        498
  60050010 * 10 MOD       9999 IS        156
 200100300 *  1 MOD       9999 IS        312


I have convinced myself that 2 moduli are enough, one modulus (255) for bishop diagonals and maybe ranks,
and another modulus (511, used with a multiplier) for files.

There is one potential problem working modulo 255 and that is that it gives the same value if all squares in the line are occupied and if they are all empty.
That won't be a problem for me as I'll not include the moving piece in the calculation.

Re: Low memory usage attack bitboard generation

PostPosted: 09 Oct 2011, 21:50
by Gerd Isenberg
Exactly, casting out 255 (ranks, diagonals, antidiagonals) or 511 for files seems much more intuitive than the Fenner and Levene modulo constants, assuming the sliding piece clear to make 255 an impossible index. However, their mod 258 for files is a bit denser. I would still prefer Kindergarten multiplication with much less instructions than modulo, which is just too expensive instruction despite done by reciprocal multiplication. Thus instead mod 255, * 0x0101010101010101 shift right 56.

Re: Low memory usage attack bitboard generation

PostPosted: 10 Oct 2011, 21:54
by crystalclear
MOVE GENERATION WITH PERFECT HASH FUNCTIONS
Trevor Fenner Mark Levene1
London, U.K.


I read the mathematic article which is the basis for the webpage you gave me.

I don't understand their stuff - I didn't wade through the maths, but I read this for example ...
We use the hash function h2 defined in (10), so
the modulus is 257. It follows from (13) that there is a single unused address at 172. The possible moves
of the Bishop may be found using the method described above.


We use the hash function h1 defined in (3), so the modulus is 258. It follows from (6) and (8) that there
is a single gap of length 2 at the values 86 and 87.


I really don't like the sound of that at all. I want the bit pattern in my index to be the bits in the line, with the waste zeros from the bitboard (that do not form part of the line) to be cast out. Zeros that represent empty squares on the line should of course stay and form part of the index. I also want the bits to remain in the order they occur in the line. What they seem to be doing is hashing. What I want to do is simply extract the bit pattern for the line.

Code: Select all
However, their mod 258 for files is a bit denser.


I'm not sure I understand you here. Don't be confused between the modulus I use for a caclulation and the range of the values I find as a result of the calculation.
In my example of a 3*3 square of decimals, I found I could extract the contents of diagonals, rows and colums of decimal digits using a modulus of 999 most of the time and occasionally 9999. When I used modulus 9999 I was still able to get a three digit result (not 4 digits) that corresponded the contents of the column, with the digits in (reverse?) order.

Code: Select all
       314
       159
       268

      PRINT "EXTRACT COLUMNS"
      PRINT 008009004 " *100 MOD " 9999 " IS " 008009004 *100 MOD 9999
      PRINT 060050010 " * 10 MOD " 9999 " IS " 060050010 * 10 MOD 9999
      PRINT 200100300 " *  1 MOD " 9999 " IS " 200100300 *  1 MOD 9999
 
EXTRACT COLUMNS
   8009004 *100 MOD       9999 IS        498
  60050010 * 10 MOD       9999 IS        156
200100300 *  1 MOD       9999 IS        312

So I am happy with what I am doing at the moment. I'm still on a sort of proof of concept stage with my attack bitboards, but it is becoming clearer in my mind.
I am going to use clean indicies that correspond to the bit pattern of pieces on the line, and that way the rest will be sort of clean too, and I should be able to change the way indicies are calculated without it having a knock on effect on the rest.

Last night I tested out some software for indicies for rooks.
For files, rather than use modulo 255 I used a simple shift, since my bits were contiguous.

int f;
x &= File[f=file(square)];
x >>= (8*f);
x &= 255;
return x;

For the ranks I shifted the bits to get a pattern
0000000a 0000000b 0000000c 0000000d 0000000e 0000000f 0000000g 0000000h
multiplied by 255 to get
aaaaaaaa bbbbbbbb cccccccc dddddddd eeeeeeee ffffffff gggggggg hhhhhhhh
ANDed it with
10000000 01000000 00100000 00010000 00001000 00000100 00000010 00000001
to get
a0000000 0b000000 00c00000 000d0000 0000e000 00000f00 000000g0 0000000h
and took the answer modulo 255 to get
00000000 00000000 00000000 00000000 00000000 00000000 00000000 abcdefgh

Code: Select all
        int r;
        x &= Rank[r=rank(square)];
        x >>= r;
        x *= 255ULL;
        x &= 0x8040201008040201ULL;
        x = x % 255;
        return x;


It isn't the modulo 511 calculation I described ealier - It's another way to extract the bits that are scattered 8 apart
but gives me them in the order abcdefgh rather than hgfedcba.
It's only a few instructions in a high level language, but I read last night that some C compilers generate fairly slow and complex code for the remainder 255 calculation.
However I first want to get some chess software written and debugged, before I think about going for speed.

Although I haven't made much progress on getting things written yet, its becoming clear to me that I want to write a few seperate items.
A. Index generation that simply casts out the unwanted zeros from lines.
B. Line attack values
C. Bitmaps [256] for possible attacks with a line layout.
D. Bitmaps [8] for possible attacks for the piece position along a line.
E. AND them to find the unique attack possibility for active piece and line layout.
F. Bit scattering to create final answer.

Re: Low memory usage attack bitboard generation

PostPosted: 11 Oct 2011, 20:19
by Gerd Isenberg
crystalclear wrote:
MOVE GENERATION WITH PERFECT HASH FUNCTIONS
Trevor Fenner Mark Levene1
London, U.K.


I read the mathematic article which is the basis for the webpage you gave me.

I don't understand their stuff - I didn't wade through the maths, but I read this for example ...
We use the hash function h2 defined in (10), so
the modulus is 257. It follows from (13) that there is a single unused address at 172. The possible moves
of the Bishop may be found using the method described above.


We use the hash function h1 defined in (3), so the modulus is 258. It follows from (6) and (8) that there
is a single gap of length 2 at the values 86 and 87.


I really don't like the sound of that at all. I want the bit pattern in my index to be the bits in the line, with the waste zeros from the bitboard (that do not form part of the line) to be cast out. Zeros that represent empty squares on the line should of course stay and form part of the index. I also want the bits to remain in the order they occur in the line. What they seem to be doing is hashing. What I want to do is simply extract the bit pattern for the line.

We agree on the Fenner/Levene paper.
crystalclear wrote:
Code: Select all
However, their mod 258 for files is a bit denser.


I'm not sure I understand you here. Don't be confused between the modulus I use for a caclulation and the range of the values I find as a result of the calculation.

I meant that mod 258 leaves a denser range than mod 511, which I assumed you may use.
crystalclear wrote:In my example of a 3*3 square of decimals, I found I could extract the contents of diagonals, rows and colums of decimal digits using a modulus of 999 most of the time and occasionally 9999. When I used modulus 9999 I was still able to get a three digit result (not 4 digits) that corresponded the contents of the column, with the digits in (reverse?) order.

Code: Select all
       314
       159
       268

      PRINT "EXTRACT COLUMNS"
      PRINT 008009004 " *100 MOD " 9999 " IS " 008009004 *100 MOD 9999
      PRINT 060050010 " * 10 MOD " 9999 " IS " 060050010 * 10 MOD 9999
      PRINT 200100300 " *  1 MOD " 9999 " IS " 200100300 *  1 MOD 9999
 
EXTRACT COLUMNS
   8009004 *100 MOD       9999 IS        498
  60050010 * 10 MOD       9999 IS        156
200100300 *  1 MOD       9999 IS        312

So I am happy with what I am doing at the moment. I'm still on a sort of proof of concept stage with my attack bitboards, but it is becoming clearer in my mind.
I am going to use clean indicies that correspond to the bit pattern of pieces on the line, and that way the rest will be sort of clean too, and I should be able to change the way indicies are calculated without it having a knock on effect on the rest.

Last night I tested out some software for indicies for rooks.
For files, rather than use modulo 255 I used a simple shift, since my bits were contiguous.

Guess you mean ranks? Yes, this shift below is fine.
crystalclear wrote:
int f;
x &= File[f=file(square)];
x >>= (8*f);
x &= 255;
return x;

For the ranks I shifted the bits to get a pattern
0000000a 0000000b 0000000c 0000000d 0000000e 0000000f 0000000g 0000000h
multiplied by 255 to get
aaaaaaaa bbbbbbbb cccccccc dddddddd eeeeeeee ffffffff gggggggg hhhhhhhh
ANDed it with
10000000 01000000 00100000 00010000 00001000 00000100 00000010 00000001
to get
a0000000 0b000000 00c00000 000d0000 0000e000 00000f00 000000g0 0000000h
and took the answer modulo 255 to get
00000000 00000000 00000000 00000000 00000000 00000000 00000000 abcdefgh

Code: Select all
        int r;
        x &= Rank[r=rank(square)];
        x >>= r;
        x *= 255ULL;
        x &= 0x8040201008040201ULL;
        x = x % 255;
        return x;


instead of the above you may simply multiply the masked A-File with the main-diagoal + shift right 56
https://chessprogramming.wikispaces.com ... 20a%20Rank
Code: Select all
masked bits                             mapped to 8th rank
bits on A-file   *  main-diagonal    =  with garbage     -> 1st rank
A . . . . . . .     . . . . . . . 1     A B C D E F G H     . . . . . . . .
B . . . . . . .     . . . . . . 1 .     B C D E F G H .     . . . . . . . .
C . . . . . . .     . . . . . 1 . .     C D E F G H . .     . . . . . . . .
D . . . . . . .     . . . . 1 . . .     D E F G H . . .  >> . . . . . . . .
E . . . . . . .  *  . . . 1 . . . .  =  E F G H . . . .  56 . . . . . . . .
F . . . . . . .     . . 1 . . . . .     F G H . . . . .     . . . . . . . .
G . . . . . . .     . 1 . . . . . .     G H . . . . . .     . . . . . . . .
H . . . . . . .     1 . . . . . . .     H . . . . . . .     A B C D E F G H

crystalclear wrote:It isn't the modulo 511 calculation I described ealier - It's another way to extract the bits that are scattered 8 apart
but gives me them in the order abcdefgh rather than hgfedcba.
It's only a few instructions in a high level language, but I read last night that some C compilers generate fairly slow and complex code for the remainder 255 calculation.

Because div/mod is an expensive instruction so far, with > 23 cycles throughput (latency > 45) or so on modern x86-CPUs, and compiler correctly apply reciprocal mul, which is still too slow, since one 64*64=128bit mul, and one further 32-bit mul and sub is needed for mod by constant.

http://www.intel.com/content/www/us/en/ ... anual.html
9.2.4 Replace 128-bit Integer Division with 128-bit Multiplies
Table C-15. General Purpose Instructions (Contd.) on DIV - Latencies and Throughput

While I like the mod 255 casting out idea, my point is there are faster alternatives for a 1:1 mapping from scattered occupied bits along a diagonal to one index-byte.

https://chessprogramming.wikispaces.com ... f+any+Line
https://chessprogramming.wikispaces.com ... to%20Ranks

1. Parallel prefix shift/mask (for cpu < 2core with slow mul)
Code: Select all
int collapsedFilesIndex(U64 b) {
   b |= b >> 32;
   b |= b >> 16;
   b |= b >>  8;
   return b & 0xFF;
}

2. Multiplication by A-File
Code: Select all
int collapsedFilesIndex(U64 b) {
   const U64 aFile  = C64(0x0101010101010101);
   return (b * aFile) >> 56;
}

with this bitboard diagrams
Code: Select all
masked diagonal  *  A-file              mapped
                                        to 8th rank      ->  1st rank
. . . . . . . H     1 . . . . . . .     A B C D E F G H     . . . . . . . .
. . . . . . G .     1 . . . . . . .     A B C D E F G .     . . . . . . . .
. . . . . F . .     1 . . . . . . .     A B C D E F . .  >> . . . . . . . .
. . . . E . . .     1 . . . . . . .     A B C D E . . .  56 . . . . . . . .
. . . D . . . .  *  1 . . . . . . .  =  A B C D . . . .     . . . . . . . .
. . C . . . . .     1 . . . . . . .     A B C . . . . .     . . . . . . . .
. B . . . . . .     1 . . . . . . .     A B . . . . . .     . . . . . . . .
A . . . . . . .     1 . . . . . . .     A . . . . . . .     A B C D E F G H

crystalclear wrote:However I first want to get some chess software written and debugged, before I think about going for speed.
Although I haven't made much progress on getting things written yet, its becoming clear to me that I want to write a few seperate items.
A. Index generation that simply casts out the unwanted zeros from lines.
B. Line attack values
C. Bitmaps [256] for possible attacks with a line layout.
D. Bitmaps [8] for possible attacks for the piece position along a line.
E. AND them to find the unique attack possibility for active piece and line layout.
F. Bit scattering to create final answer.


Good luck and fun with your bitboard approach!

Gerd

Re: Low memory usage attack bitboard generation

PostPosted: 11 Oct 2011, 21:26
by crystalclear
I meant that mod 258 leaves a denser range than mod 511, which I assumed you may use.


If the 8 individual bits modulo 511 calculate to [1], [2], [4],[8],[16],[32],[64], and [128]
then the calculated index will always be an integer in the range 0-255.

The idea was analagous to my example extracting columns of decimal digits from a 3*3 array using moulus 9999.
The range of values generated is 0-999 and not 0-9999.

To make sure the theory is correct, I just changed the numbers in my table to the digits from the maths number e instead of pi ...

Code: Select all
      PRINT 271
      PRINT 828
      PRINT 182
     
      PRINT 182828271 " IS THE INTEGER REPRESENTATION"
     
      PRINT "EXTRACT COLUMNS"
     
      PRINT 002008001 " *100 MOD " 9999 " IS " 002008001 *100 MOD 9999
      PRINT 080020070 " * 10 MOD " 9999 " IS " 080020070 * 10 MOD 9999
      PRINT 100800200 " *  1 MOD " 9999 " IS " 100800200 *  1 MOD 9999


This produces the output

Code: Select all
       271
       828
       182
 182828271 IS THE INTEGER REPRESENTATION
EXTRACT COLUMNS
   2008001 *100 MOD       9999 IS        182
  80020070 * 10 MOD       9999 IS        728
 100800200 *  1 MOD       9999 IS        281


So using modulus 511 for my calculations gives me 0-255 as my range of indicies.

We use the hash function h1 defined in (3), so the modulus is 258. It follows from (6) and (8) that there
is a single gap of length 2 at the values 86 and 87.


The way I see it, that is denser than a slightly larger range (0-257) that has a couple of holes.
But one or two words is neither here nor there, compared to arrays

Re: Low memory usage attack bitboard generation

PostPosted: 11 Oct 2011, 21:35
by crystalclear
Code: Select all
Guess you mean ranks? Yes, this shift below is fine.


The bit layout in my chess program is such that a white pawn moves from square n to square n+1.
I believe that lots of people have a pawn moving from square n to square n+8.
That makes ranks in one program behave like files in another for the purpose of this sort of calculation.

I don't really see the logic for having a pawn move 8 squares at a time, but if that's what most people choose to do, so be it.

I think that explains your comment - if not, let me know.

Re: Low memory usage attack bitboard generation

PostPosted: 11 Oct 2011, 22:22
by Gerd Isenberg
crystalclear wrote:If the 8 individual bits modulo 511 calculate to [1], [2], [4],[8],[16],[32],[64], and [128]
then the calculated index will always be an integer in the range 0-255.
...
So using modulus 511 for my calculations gives me 0-255 as my range of indicies.

The way I see it, that is denser than a slightly larger range (0-257) that has a couple of holes.
But one or two words is neither here nor there, compared to arrays

Ahh, yes I understand. You are right.

Re: Low memory usage attack bitboard generation

PostPosted: 11 Oct 2011, 22:27
by Gerd Isenberg
crystalclear wrote:
Code: Select all
Guess you mean ranks? Yes, this shift below is fine.


The bit layout in my chess program is such that a white pawn moves from square n to square n+1.
I believe that lots of people have a pawn moving from square n to square n+8.
That makes ranks in one program behave like files in another for the purpose of this sort of calculation.

I don't really see the logic for having a pawn move 8 squares at a time, but if that's what most people choose to do, so be it.

I think that explains your comment - if not, let me know.

Oups, didn't get that first. Different mapping is always worth some confusion ;-)
Your mapping is quite common as well and safes some wrap ands for pawn attacks.

Re: Low memory usage attack bitboard generation

PostPosted: 18 Oct 2011, 22:40
by crystalclear
Well, I have written as much attack bitboard software as I want for the time being.

It seems I have shown to myself that what I had been planning is possible.
I could lookup the bitpattern that I wanted along a line from a table of 36 possibilites, and then I'd scatter the bits back along the line in the 8x8 board.

I created tables with 36 entries for (A) the positions of the blockers, (B) motion upto the blockers, and (C) motion up to and including the blockers. eg
Code: Select all
A. 00100010
B. 00011100
C. 00111110
I don't know which would be the most practical to use.

I used triangular numbers which I'll write as T(x) to create the information in the tables.
T(1)=1, T(2)=3, T(3)=6, etc.
36 is a triangular number. The triangular root (something like a square root but for triangles) of 36 is 8.
That's the reason I had 36 entires in my table.

For each number 0-35, I used the integer triangular root and the remainder.
For example using the number 18, I'd calculate that 18 is T(5)+3: the 5th triangular number plus 3.
The remainder is never bigger than the triagular root, for if it were larger, you could make a bigger triangle. (Eg T(5)+6=15+6=21 which is already T(6).)
So for each number 0-35 I'd get a pair of integers both in the range 0-7, a larger number and a smaller number. The "small" number could be the same but not bigger than the "large" number.

The small number would indicate how the piece can move along the line in one direction and the larger number how far the piece can move along the line the other way, before hitting a blocker.
So for example the 18th entry on my table (T(5)+3) would contain information for a piece that could move between location 3 and location 5 (blocked at 2 and 6).

Code: Select all
   01234567
A. 00100010
B. 00011100
C. 00111110


That gave me a logical way to populate my tables and to verify their contents.
My software would then scatter the bits found in the table back along the line.

In the same way that we discussed casting out, to generate a compact 8 bit occupancy number 0-255 from the bits on a line, I needed to scatter 8 bits back along ranks, files and diagonals. Three of the cases seemed fairly simple. But scattering bits 8 apart for rooks didn't seem so easy. I used a table of 256 entries but was wondering if there is a simple and elegant way of doing the same thing.

Re: Low memory usage attack bitboard generation

PostPosted: 19 Oct 2011, 20:36
by Gerd Isenberg
crystalclear wrote:...
In the same way that we discussed casting out, to generate a compact 8 bit occupancy number 0-255 from the bits on a line, I needed to scatter 8 bits back along ranks, files and diagonals. Three of the cases seemed fairly simple. But scattering bits 8 apart for rooks didn't seem so easy. I used a table of 256 entries but was wondering if there is a simple and elegant way of doing the same thing.

Lookup[256] is fine. Otherwise multiplication of the byte with a diagonal (9 bits apart) and some shifing/anding.
Code: Select all
. . . . . . . .     . . . . . . . 1     C D E F G H . A     A . . . . . . .
. . . . . . . .     . . . . . . 1 .     D E F G H . A B     B . . . . . . .
. . . . . . . .     . . . . . 1 . .     E F G H . A B C  >> C . . . . . . .
. . . . . . . .     . . . . 1 . . .     F G H . A B C D  7  D . . . . . . .
. . . . . . . .  *  . . . 1 . . . .  =  G H . A B C D E  &  E . . . . . . .
. . . . . . . .     . . 1 . . . . .     H . A B C D E F  A  F . . . . . . .
. . . . . . . .     . 1 . . . . . .     . A B C D E F G     G . . . . . . .
A B C D E F G H     1 . . . . . . .     A B C D E F G H     H . . . . . . .

Re: Low memory usage attack bitboard generation

PostPosted: 20 Oct 2011, 22:09
by crystalclear
Thanks Gerd

It isn't the modulo 511 calculation I described ealier - It's another way to extract the bits that are scattered 8 apart
but gives me them in the order abcdefgh rather than hgfedcba.


I think I'm making this unecessarily difficult for myself. First I reversed the order of the bits when I extracted them because I wasn't happy with the order.
(I viewed it as correcting the order!)
And now I am complaining again because I want to reverse them as I scatter them.
It serves me right! LOL.

Code: Select all
A . . . . . . .    H . . . . . . .
B . . . . . . .    G . . . . . . .
C . . . . . . .    F . . . . . . .
D . . . . . . .    E . . . . . . .
E . . . . . . .    D . . . . . . .
F . . . . . . .    C . . . . . . .
G . . . . . . .    B . . . . . . .
H . . . . . . .    A . . . . . . .


Anyway, I know how to reverse the order of the bits simply, and you have shown me how to scatter them without a table, so I can scatter them in the order I want now, and I'll keep my eyes open for something that does it in one go, as one never knows when it will be useful.

I don't actually have a proper use for attack bitboards yet. My move generators work without them.
I'm thinking along the lines that my simplistic approach to attack bitboards might best be used by me for two purposes.

1. To populate larger more complicated attack tables.
2. To verify correct operation of the those attack tables.
3. Just to get me started with Static Exchange Evaluation (SEE).

For SEE I might want to consider for example the rooks and queens attacking a particular square along that square's rank and file.
It seems to me that I need to take the non-grid-sliders bitboard and find the table index for the pattern of blockers.
Then look up the inner motion pattern table using that index, ie motion upto and not including the blockers, and AND that with the bitboard for grid-sliders.
That should then give me the 'friendly and enemy' 'rooks and queens' defending and attacking the square that is being SEE evaluated.

I'm a bit unsure how to use that information once I've got it.
With knights, I can count bits to determine the number of attacking and defending knights.
Thinking of knights alone ....
With an excess of knights I think you can capture the enemy piece.
With a balance or deficit of knights, you can trade a knight for the piece, and
With no knights, you can't do anything.

I'm wonder how I can generalise that sort of idea to do a form of static exchange evaluation without generating or executing any moves.
It'd be nice to simply weigh up the balance of power using the data found with the attack tables.
As someone who often ends a chess exchange a piece down it'd be nice to start calculating exchanges more precisely.
But it seems to me that a chess program uses SEE to have a good idea whether to consider a capture early or late in its alpha beta search
and so a bad SEE analysis only affects the final move decision in the sense that it wastes thinking time, and not in the sense that it causes bad decisions.
Eg.
P x unprotected queen - generally worth immediate analysis, even if occasionally allows you to be mated
Q x protected pawn - generally not a good move, though sacrifices work at times.

Oops, I am waffling - I guess there will be a SEE thread somewhere!

Re: Low memory usage attack bitboard generation

PostPosted: 09 Mar 2012, 16:17
by crystalclear
I started looking at attack bit-board generation using magic numbers.

What are magic numbers?
------------------------------
The idea is that to create an index for an attack array, you need to collate the bits of interest which are scattered in the bit-board.
Here is a simplified example.
Suppose you want to collate bit 0 and bit 2 from a word on a 4 bit computer.
The word with the bits looks like this ....
-a-b,
where the hyphens represent don't care bits.
We want to come up with a number like 3 in this example that can be used as a multiplicative constant
that will help us to collate the bits.
Let's assume the bits which are not of interest are set to 0, but I'll draw that as a letter 'o' to distinguish them from zeros which are of interest to us.
o0o0 * 3 = 0000
o0o1 * 3 = 0011
o1o0 * 3 = 1100
o1o1 * 3 = 1111

That helps us collate the bits we wanted, as we can now see they are together in the middle of the result of the multiplication.

o0o0 * 3 = 0000 = -00-
o0o1 * 3 = 0011 = -01-
o1o0 * 3 = 1100 = -10-
o1o1 * 3 = 1111 = -11-

It's now clear that if we had used a multiplicative constant that shifts the bits we want left a little, we could have collated the bits together at the top of the word, as shown here, using 6 instead of 3 as our multiplier. (6 is 3 shifted left one bit.)

o0o0 * 6 = 0000 = 00--
o0o1 * 6 = 0110 = 01--
o1o0 * 6 = (1)1100 = 10--
o1o1 * 6 = (1)1110 = 11--

To obtain the bits we want, we now just shift the result right by "the computer word size - the number of bits we are collecting".

o0o0 * 6 >> 2 = 0000 >> 2 = 00
o0o1 * 6 >> 2 = 0110 >> 2 = 01
o1o0 * 6 >> 2 = 1100 >> 2 = 10
o1o1 * 6 >> 2 = 1110 >> 2 = 11

Excuse the strange mixture of decimal and binary notation. I just noticed what I've been doing :x
So that's magic numbers. 6 is a magic number for collating bits 0 and 2 in a 4 bit register, in the example used.

Back to chess and my experiences with generating magic numbers.

1. I went for 14 bit rook magics.
--------------------------------------
I wanted to generate magic numbers to find attack indexes. (The spelling corrector doesn't like indicies!) I decided to try to generate 14 bit indexes for my rook attack indexes. The number 14 comes from a rook having 7 squares on the same rank and 7 squares on the same file. I wanted to use these to find indexes into my attack arrays so that the attack array would tell me which pieces can be captured.

This might have been a bad choice. The alternative is smaller indexes. The square at the end of a rook's travel never blocks it from traveling any further so some people leave these 'end bits' out of the calculation of the attack index. That gives them a smaller index and maybe smaller attack arrays. I didn't want to do that.
If my attack data says a rook on e4 can take something on e8, I didn't want to then have to look if there is anything on e8. I had thought that generating attack indexes using less bits had benefits that came with this hidden penalty.

People will argue that this hidden penalty is fictitious. When a rook moves, both white and black pieces can block its motion, so attack arrays calculations will usually be performed using a bit-board representing pieces if both colors. So when the attack data says that my example rook on e4 can take something on e8, ordinarily, it will be necessary to test that the piece on e8 is an enemy piece. But I have plans to use my attack arrays for some sort of static exchange evaluation where pieces of both colors are of interest, so I wanted 14 bit indexes so that attack data could give more accurate capture information.

2. I didn't find many rook magics.
---------------------------------------
I didn't find many magics numbers for rooks, and without all 64 (one for each square) they are useless to me.
I did find a complete set of magic numbers for bishops, and without much difficulty.
It could be that my software for finding magics wasn't great, but I think there's another explanation.

Rook moves in one direction form consecutive bits in my bit-boards. The string of consecutive bits actually has a hole for the square occupied by the rook.
However for some squares the rook is at the edge of the board and that creates a string of 7 consecutive bits to index, for movement along one line.
And then a bit for motion in the other direction can touch that string and make 8 consecutive bits (stuvwxyz say).

The other bits for rook movement create bits that are 8 apart ...
000a0000000b0000000c0000000d0000000e00
something like that.

Now when you try to shift bits around to create an index
Code: Select all
                              *
                       stuvwxyz
   000a0000000b0000000c0000000d0000000e00

you can see that no matter how you shift the string of bits 8 apart, it is not going to be easy to avoid a collision eg bit z with bit d in the diagram above
of bit b with bit s if we try moving bit c so that it is clear.
Code: Select all
                       *
                       stuvwxyz
    000a0000000b0000000c0000000d0000000e00


3. Bishop magics didn't give me a problem.
--------------------------------------------------
Bishop motion doesn't give strings of consecutive bits. It is maybe much easier to get the bits that are spread apart to interleave to collate them.

4. Small bit rook magics don't seem to be a problem.
--------------------------------------------------------------
Code: Select all
                      !       !
                       uvwxyz
   000a0000000b0000000c0000000d0000000e00

This illustration is an attempt to show that it is far easier to shift bits to avoid collisions when strings of consecutive bits are shorter.
I think that is one reason that other people have some very fast magic number generators for their uses.

5. Creating magics by hand.
--------------------------------
I created some magics by hand to try to understand why using random numbers was taking so long in particular cases.
There are various techniques you can use to get bits to avoid colliding.
5a. Shift a bit into the hole representing the rook's current square
Code: Select all
                         !   
                     stuv-xyz
      000a0000000b0000000c0000000d0000000e00


5b. Vary the gap between some bits by shifting them outside the destination window of 14 bits (well, 14 in my case) and repeating the pattern shifted off the other end to recover them.
The hashes represent the position of the destination 14 bits.
Code: Select all
                  ##############           
                     stuvwxyz
         000a0000000b0000
                          000a0000000b0000
--------------------------------------------
                    bstuvwxyza


5c. Similarly, break up the string of consecutive bits.
Code: Select all
                  ##############     
             stuvwxyz
                           stuvwxyz
--------------------------------------------
                  xyz      stuvw


Summary
----------
I failed to adequately create 14 bit rook magics. Full magics for bishops were easy.
I had thought that the problem would simply require a little more computing time than for people 10 bits for inner squares, 11 for edges and 12 for corners.
If the problem is not intractable, it is at least difficult, and seems to not just require longer due to a larger number of bits, but due to the nature of the problem: getting bits 8 apart to avoid a string of 8 consecutive bits. There are some tricks that can be used to help avoid the collisions, but it appears they can not be used often enough for me to create full rook magics easily.

I am now going to look for another way to generate rook attack indexes.

Re: Low memory usage attack bitboard generation

PostPosted: 09 Mar 2012, 16:34
by crystalclear
Example hand calculation of a magic number.

I think my software for generating magic numbers spurted out the numbers for bishops fairly easily, but I think the odd one was taking longer than my patience lasted so I calculated one by hand.

I drew low order bits on the left so that I could shift bits by simply using the space bar, and arrange to have a copy of each bit once in the desired places, marked with hashes. The * represents the square the bishop occupied, number 45 if I remember correctly. The 1s represent the squares it could move to. Those bits represent the bits that need to be collated. Like a jigsaw puzzle, I played with the shifting of a few copies of the number until I had one of each bit in the destination.
Then I measure the length of the shifts (marked -------|-------|-------|-----) as a number eg 20000000 and added them together to get the magic number.
(Initial string is binary, final magic number shown in hexadecimal.)

Code: Select all
    100000000100000000100000000100010000101000000*000000101000010001
                                                         ###########
    a        b        c        d   e    f g             h i    j   k
    -------|--a        b        c        d   e    f g             h i    j   k
    -------|-------|-a        b        c        d   e    f g             h i    j   k
    -------|-------|-------|-----a        b        c        d   e    f g             h i    j   k
    -------|-------|-------|-------|-------a        b        c        d   e    f g             h i    j   k
    -------|-------|-------|-------|-------|-------|-a        b        c        d   e    f g             h i    j   k
    -------|-------|-------|-------|-------|-------|-------|-----a        b        c        d   e    f g             h i    j   k
                                                         ###########
                                                         figdcbjeahk



0000000000000001
0000000000000400
0000000000020000
0000000020000000
0000008000000000
0002000000000000
2000000000000000
................
2002008020020401


Re: Low memory usage attack bitboard generation

PostPosted: 09 Mar 2012, 18:23
by crystalclear
With rooks, it is generation of attack bit-board indexes that has given me a headache, and I am looking for an alternative to magic numbers to create a nice logical index.
The contents of the attack bit-boards has been easy in comparison. My index has 36*36 possibilities. That makes 1296 entries in a table.
I have 3 tables, one for movement up to a blocker, one for movement up to the blocker and including capture, and one for captures only.

In contrast, I don't see how to layout my bishop data in a compact fashion.

http://chessprogramming.wikispaces.com/ ... +Bitboards says ...
A second table with 102400 entries can hold all rook attack bitboards. This table must also be initialized.

So my rook data bit-boards look fine as my tables are significantly smaller.
The 36 in the indexes has been explained before, eg a rook has its rank motion limited to say rank 2 to rank 5. 2<5. There are 36 such pairs of numbers, 36 being the 8th triangular number, (1/2) n (n+1) and n is 8 , 8*9=72. 72/2 = 36.

But what about bishops?

http://chessprogramming.wikispaces.com/ ... +Bitboards says ...
A second table with 5248 entries can hold all bishop attack bitboards. This table must also be initialized.

In that article, they are indexing attack arrays based on a 'don't care' state for the edge of the board squares. My data would be larger since I am looking to have attack tables that actually tell me if edge squares are captures or not, without having to do the extra step of seeing if they are occupied.

My rook tables are so much smaller than theirs because I share data between all rooks that can move (for example) to the b file, to the g file, to the 2nd rank and to the 5th rank. A rook c4 and one on e3 would generate the same index if their motion were limited in the same way.
The attack data just contains a rectangle with corners at b2 and g5 (and b5 and g2). The real attack data is actually the data from the table ANDed with the possible rook motion, but that information is at hand - it was used to calculate the occupied squares to get the index.

Here is an overview of my rook attack table generation.
Generate 1296 entries for all three tables ....
Code: Select all
    for (type=0;type<types;type++)
    for (t1=0;t1<36;t1++)
    for (t2=0;t2<36;t2++)


All possible motion converted from numbers to ranks and files
Code: Select all
            left=triangular_remainder(t1);
            right=triangular_root(t1);
            top=triangular_root(t2);
            bottom=triangular_remainder(t2);


print the entries for the attack tables.
Code: Select all
            switch (type)
            {
                case inner:
                    print_BB_binary(bitbox(stuffed,left,right,bottom,top));
                break;
                case outer:
                    print_BB_binary(bitbox(stuffed,left-1,right+1,bottom-1,top+1));
                break;
                case attack:
                    print_BB_binary(bitbox(hollow,left-1,right+1,bottom-1,top+1));
                break;
            }


That generates data for my tables ...
Code: Select all
0b\
00000000\
00000111\
00000101\
00000101\
00000101\
00000111\
00000000\
00000000, // 650 (3 1 5 1) d2 f2


This example entry is the data for any rook that can capture on 1st and 3rd ranks and on c and g files.
The data will only ever be accessed by rooks that are on the 2nd rank and the d,e, and f files.
Bits like the bit for d5 are zero so we know d5 cannot be captured from d2 (with the board layout indexing this attack entry).
Some of the bits are inconsequential eg the bit for a8 but are set to zero.
There are other inconsequential bits eg the bit for c1 which is set, but a rook cannot go from d2 to c1 and I'm happy leaving it set.
The aesthetic quality of the rectangle will make it easy to verify the table if necessary.

========================================================================

Anyway, my rhetorical question is ..
is there a simple similarly dense way of compressing bishop attack data for different squares into a table?

Re: Low memory usage attack bitboard generation

PostPosted: 11 Mar 2012, 09:33
by Lasse Hansen
crystalclear wrote:
Code: Select all
Guess you mean ranks? Yes, this shift below is fine.


The bit layout in my chess program is such that a white pawn moves from square n to square n+1.
I believe that lots of people have a pawn moving from square n to square n+8.
That makes ranks in one program behave like files in another for the purpose of this sort of calculation.

I don't really see the logic for having a pawn move 8 squares at a time, but if that's what most people choose to do, so be it.

I think that explains your comment - if not, let me know.


One reason for having +- 8 squares for pawn moves is that the instructions bsf and bsr (bitscan forward/reverse) will help in move ordering in a 'real' chess program. For me, I use H8, G8, ... B1, A1 (lsb-msb) and this makes the first 32bits of the board 'black' and the rest 'white'. In move generation, I store moves that goes into the opponent area first. If the square bitmap is based on that a pawn shall go +-1 square (I also did that in my first version) one divides the board into A8-D1 + E8-H1 areas.

Regards, Lasse

Re: Low memory usage attack bitboard generation

PostPosted: 23 Mar 2012, 00:31
by crystalclear
I failed in my attempts to generate normal 14 bit rook magic numbers.

But rather than use less bits like most people, I have stubbornly soldiered on by looking at preprocessing the rook occupancy bits.
I think I've read - maybe on here - of people adding one constant before multiplying by the so called magic number.
I intend to relocate certain bits by adding the constant and then ANDing its complement.
Maybe the following trick is common practice, but I don't remember seeing it described and someone else might find it useful.

Example:-

Code: Select all
  000000a00000000b0000000c000000000d (the bits)
+ 0000000000000111000000000111111111 (the constant)
= 000000a00000b***0000000cd********* (added - * is don't know/ don't care)
& 1111111111111000111111111000000000 (Complement of the constant)
  000000a00000b0000000000cd000000000 (the relocated bits)


So in this example, bits b and d have been relocated.
There are limitations to what you can do with this: bits move to the left for example and don't cross over, etc.
But a few extra instructions should allow me to use bits in a location where I can handle them, which at least is better than having the bits in a location which is too difficult for me.