Page 1 of 1

TB caching

PostPosted: 28 Nov 2005, 15:03
by Daniel Shawul
Today I measured the performance of bitbases with out
loading them in to ram. I just read the necessary byte from disk
at run time. And the result i found is frustrating, almost 4 times slow down in nps. I did not implement a TB cache which i am hoping it will give some speed up. But I already use two more caches in search, the main hash table, and the evaluation cache. Especially since bitbases are used only in eval, eval cache is important. And because of that i did not think i should implement an independt cache for TBs. But now i am in dilemma. Should I read chunk of data (more than one byte) and cache that? Any advice appreciated.
daniel

Re: TB caching

PostPosted: 28 Nov 2005, 17:19
by Daniel Shawul
Woops.. I turned off eval cache by mistake during tb generation.
Now the slow down is acceptable only 30%

Re: TB caching

PostPosted: 07 Dec 2005, 22:44
by diepeveen
Daniel Shawul wrote:Woops.. I turned off eval cache by mistake during tb generation.
Now the slow down is acceptable only 30%


TB caching is pretty easy if you have your own EGTBs.

You really must read chunks of data from disk.

Disk access is somewhere between 8-10 ms.

this whereas i guess a node in your program is a factor 1000 faster.

So the advantage of caching big chunks is that with a single read you can get several probes for the same disk access latency.

Note i'm also storing of course in transpositiontable whether a position is in egtb (i store in transtable also eval of a position which is helpful here with WDL tables).

I suffer 0 loss in NPS when probing a little in EGTBs.

Only when mating it is of course a complete loss, but by then you know already the outcome of a game.

Vincent

Re: TB caching

PostPosted: 08 Dec 2005, 06:37
by Daniel Shawul
Vincent, i want to ask you something.
I think that you store WDL in your tbs.
How do you ensure progress? Do you have separate TBs with DTM
that you probe at the root. I have noted that if i probe the tbs at qsearch (internal nodes left out) the problem is not severe. But with interior node recognizers added, it seems that such simple eval may not make progress. I also tried adding a bonus depending on the fifty move count, hoping that captures / pawn moves are leading to progress. But this didnot work. For 5 pieces and above simple eval may not work.
daniel

Re: TB caching

PostPosted: 08 Dec 2005, 09:56
by Alessandro Scotti
Daniel Shawul wrote:Today I measured the performance of bitbases with out loading them in to ram.


Hi Daniel,
I see you are doing a lot of great work lately with bitbases! Do you have an estimate about how much elo they're worth in your program?

Re: TB caching

PostPosted: 08 Dec 2005, 10:53
by Daniel Shawul
HI Alessandro
I haven't measured anything yet. I am doing it mostly for fun,
sort of run out of things to do in other areas ;)
Hopefully i will get better elo from 4 piece bitbases than using tbs.
For 5 pieces , it won't be any better as this are accessed from disk.
In general expecting 30 elo is fair IMO.
daniel

Re: TB caching

PostPosted: 08 Dec 2005, 12:28
by H.G.Muller
5 pieces after symmetry leads to only 178M positions. In a true bitbase (only 1 bit per position to indicate if that position is won) this would occupy only 22M of storage, so why read it from disk?

The bitbase is good for determining if you make the proper conversion, when you are actually in the endgame the only way to ensure progress is to have the DTM available. (Unless your engine is so clever that it could do just as well without the TB.)

But even with the DTM encoded as a single byte (which allows you to go upto 255) a 5-piece table would occupy only 170 MB. You could condense it further by 'course-graining' the DTM: if you store it in multiples of 4 a 4-bit DTM count would allow you to go upto mate in 64 (and reduce the 5-piece TB to 85MB). The engine then would have to think at most 8 ply ahead to find the route to the next-shorter tabulated DTM. Another way to condense the DTM scale is determine the DTM for the current position, and give all positions that are further away then a certain number of moves the same code (and reread the TB from disk once you reach that DTM). For instance, with 2 bits per position, if the current DTM determined from the disk is 15, you could encode DTM 12-15 as b11, 8-11 as b10, DTM 0-7 as b01, and all draws or DTM>15 as b00. With an 8-ply search the engine can find a way to a b10 position, and from there to a b01 position, so that 8 moves later you will need to redefine the encoding as DTM 4-7 = b11, 0-3=b10, etc. So just one rereading of the TB every 8 moves, that should be doable...

Re: TB caching

PostPosted: 08 Dec 2005, 14:46
by diepeveen
Daniel Shawul wrote:Vincent, i want to ask you something.
I think that you store WDL in your tbs.
How do you ensure progress? Do you have separate TBs with DTM
that you probe at the root. I have noted that if i probe the tbs at qsearch (internal nodes left out) the problem is not severe. But with interior node recognizers added, it seems that such simple eval may not make progress. I also tried adding a bonus depending on the fifty move count, hoping that captures / pawn moves are leading to progress. But this didnot work. For 5 pieces and above simple eval may not work.
daniel


Well normally spoken diep would not probe at search depths < 3 ply depthleft.

I guess that explains most.

See settings in commercial programs in their 'probe depth'. Some do not probe last 7 ply or so.

Progress is a matter of using your evaluation function to progress.

There is the theoretic possibility like in KNNKP where it could fail.

I did never see Diep get KNNKP on the board in *any* game over its entire lifetime.

endgames like KRPKR which are real hard without EGTBs, are easy with EGTBs.

The goal is to push the pawn forwards. So you progress like you normal do. Just push a passed pawn.

Endgames like KBNK, there diep doesn't need any EGTB to mate at all. It already mates perfect there without EGTBs.

Nowadays we get 40 ply search depths in those endgames.

Even when a maximin position gets played out at perfect + 1 , then still it mates in 34 moves instead of 33.

Yet reality is that in games you never start with maximin positions.

What matters is that in 6 men, others don't have the 6 men in search,
with WDL it's very easy to have them.

More importantly, i doubt you will see any 7 men in future in DTM.

*that's* what matters, you can use EGTBs where others only can dream from ever using.

Now with 6 men you can argue Nalimov was there as first to generate them all, yet where did he release KPPKPP?

Do you know someone other than Nalimov having KPPKPP?
or KRRRKP? KNPKPP? KBPKPP? KRPKPP?

A very important EGTB that KPPKPP.

I'm busy generating that thing now in WDL. In tournaments it's better to not get it at the board against diep.

Ok, odds are utmost tiny that this will happen. I admit that. But what if?

Re: TB caching

PostPosted: 08 Dec 2005, 21:29
by Sven Schüle
H.G.Muller wrote:5 pieces after symmetry leads to only 178M positions. In a true bitbase (only 1 bit per position to indicate if that position is won) this would occupy only 22M of storage, so why read it from disk?

Hi, "H.G.",
are you sure about your numbers?

For an arbitrary 5-men position without pawns I get these numbers:
2 (w/b to move) x 462 (valid K-K combinations after symmetry) x 64 x 64 x 64 = 231M positions = 29MB memory.

For an arbitrary 5-men position with pawns I get:
2 (w/b to move) x 24 (positions of first pawn after symmetry) x 64 x 64 x 64 x 64 = 768M = 96MB memory.

How often per game do you think you can generate such large tables dynamically in memory?

Sven

Re: TB caching

PostPosted: 08 Dec 2005, 21:58
by Alvaro Begue
Well, you wouldn't read them from disk during the search. I think that's what he meant. And using around 120 MB of memory for this is a reasonable option, IMO.

What I don't quite get is how you can store a WDL score in a single bit. This might be possible for many endgames, but sure KPPKP has positions of all three types.

Re: TB caching

PostPosted: 09 Dec 2005, 08:39
by H.G.Muller
I usually store only the situations for one side to move, hence my smaller numbers. (This could already be considered as comression by course graining.) On the other hand, I am a bit more generous in encoding positions, using radix 65 (because each piece might be captured, and I want the TB to also contain al TBs with the subsets of the pieces to which we can convert). For the symmetized king I counted 10 locations, and I did not exploit the mutually-exclusive king zones.

Anyway, it does not really matter in practice if the memory used to store the TB is 10% bigger or larger. What I had in mind was indeed not generating the TB from scratch but reading it from the disk and pack it into memory. This is what you have to do anyway if you want tu use precomputed TBs. If you anticipate that only a small part of the TB has to be accessed, you could reserve the necessary DRAM space and fill it with some kind of 'demand-paging' scheme.

The things I describe are of course only necessary if the full TB does not fit in memory; for a 5-piece TB the 178MB (or even 357MB) required by a 1 byte DTM simply fits and you would not bother saving memory so that it can stay unused... So I guess the main point I want to make is that in cases where the TB does not fit the DRAM in any uncompressed format, and the accesses to it are so scattered that the frequently accessed disk sectors would flush each other out of the disk cache so that you would have to do repeated reading during the search, it would be better to try to salvage some information from the sectors that you would otherwise have to discard, by compressing them.

If both won and lost positions are important, indeed you can't do with a single bit, alas.