by H.G.Muller » 30 Mar 2006, 10:13
The main problem will be that a 6-men (pawnless) EGTB nominally require 64 times as much storage as a 5-men. Using the 8-fold symmetry reduces the number of positions from 64**N to 10*64**(N-1). For a 5-men that equates to 160MB, but one extra man would bring it to 10GB.
Simply compiling such that all integers and adresses are 64 bit (and defining the rank and file masks with a few extra digits) should give you a working code (I cannot test this, because I don't have a 64-bit system). But on a 4GB machine it would be forced to swap the virtual memory to disk, slowing it down. This would not be too bad for just scanning through the TB in sequential order, because sequential disk reads are pretty efficient. (If your OS does not have a clever pre-fetch for disk caching, you can force the pre-fetch by using a prefetch instruction to a page a few hundred pages before the one you are currently scanning through, to force a page miss for which you don't have to wait.) For the scanning alone you would read each disksector only once to hunt for mate-in-N positions (where N is the highest DTM so far).
What would kill you is if for any mate-in-N you need to process, you would need to read from disk for every move you do (be it forward or retrograde) from that position. This would drive up the number of disk accesses a factor 30-40. So it is only doable if most of the moves go to pages of the TB currently kept in physical memory.
I tried to anticipate this a bit in the indexing scheme. Most moves you try are with the black (= losing side) pieces. You have to try forward black moves to search for 'escapes' (to not-yet-lost positions). In the current building strategy, it marks parents of 'white-wins-in-N with black to move' (whicht it finds by sequential scan through the TB) by generating retrograde moves for white. These marked states are then the 'white-wins with white to move' positions. If such a mark is new, it generates retrograde moves for black from there, to get at the grand-parents of the original position, and labels them as potential mate-in-(N+1) with black to move. Screening these potential states then occurs in the next pass through the TB, because you have to wait for the marking of all white-wins-with-white-to-move positions before you can start looking for escapes.
The screening pass through the TB is unavoidable. So I made the black pieces contributing the low-order bits to the index, meaning that a black move will always generate a position that has an index that does not differ very much from the index of the origial position. With 2 black pieces, this difference is at most 4096, i.e. this position is stored in the same memory page! With 3 pieces it falls within 256KB (64 memory pages) from the original one. So if yur OS's virtual memory system replaces pages through a least-recent-used policy, and you have more than 256KB physical memory available for user processes, all black moves should be serviceable from memory, loaded from disk by the sequential access (or vice versa).
So the tricky thing is the white moves. They can lead to accesses very far away in the TB, which will not be automatically in memory as a side effect of the sequential scanning. If there are 2 or more black pieces (I guess 6-men TB's against a bare King are not of much interest...), every white move will go to a different memory page. So the parents from a single position will be scattered over 30-40 different memory pages. From each of these pages it then starts to make grand parents through black moves,
which stay in the same page if black moves its first two pieces, but accesses a new page for every other move that black has. Hopefully black's third and later pieces are not very mobile. (Put the most mobile black piece in second place, directly after the King!) With 3 black pieces, the third having ~10 moves, the grand-parents are scattered over 300-400 memory pages, (1.2-1.6MB), which should fit in physical memory without any problem. But they are all very far away in index space, and would all have to be loaded from disk, besides the sequentially accessed block containing the original position.
The idea, however, is that the sequential scan will next go through other TB entries that differ in positions of the 2 first black pieces, (which are scanned fastest, in the inner loop of the scan), and the grandparents of all 4096 positions in one memory page will fall in the same set of 300-400 pages you just loaded. So although loading the set was expensive, you will make good use of it after you have it. In a typically won end-game, not every position in this 4096 will be a mate-in-N that you want to process
(to look for mate-in-(N+1), though. On the average perhaps only 100 (if average DTM is 41). But this means that the 30-40 pages with the grandparents will be shared by the 100 sequentially accessed mate-in-N positions that we treat in this scan through the TB, so on the average only 0.3-0.4 page miss will occur. This is not so much more than the 1 page miss you have anyway in the sequential scan.
So the bottom line seems to be that it should work reasonably well without further adaptations, for 2 or 3 black pieces, and most-likely for 4 if you have a lot of physical memory. With your 4GB you should be able to hold ~40% of the 6-men TB in memory at any time, which means that you will have many 'accidental' page hits in addition to those that were dependably provoked by the design of the indexing scheme. It is a pity, though, that your memory is just a little bit too small to be able to go by without memory swapping alltogether. With 6-men there will be many end-games that have additional symmetry that could be exploited, and is not exploited now. If black or white have 2 equal pieces, that gives you a factor 2. If the two are Bishops, it could give you another factor 2 because of the like/unlike square color. Such factors do not only reduce the size, but also the time, even without the efficiency increase you might have from not having to swap. But such tricks would not help you anyway to fit a 7-men EGTB in DRAM (for which this generator would not be much less efficient than with 6-men, but will of course take a much longer time simply because they are bigger).