Endgame bitbases

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Re: Endgame bitbases

Postby Anonymous » 05 Jul 2005, 19:03

diepeveen wrote:Who is going to rewrite Kadatch code to 64 bits?


Hi Vincent. I don't see a big problem here. The "real" Kadatch code is Huffman encoding/decoding on small blocks. No need to change anything here, when you go to 64 bits. There are small parts that do file I/O that migth need adjustments, depending on details of compiler and library implementation on the 64 bit environment. If the environment uses 64 bit longs, only little adjustment should be needed. If not, on might need some fseek64 (or equivilent) instead of fseek. Does not look too complicated to me.

Calling code must of course now handle tables of 64 bit offsets, instead of 32 bits offsets. But that is not really inside the Kadatch code, is it? Also, does not look too complicated to me, to change this. Actually it could make all of the Nalimov/Kadatch code significantly easier: No need anymore to split TBs into parts, and all that code, that has to handle this.

Regards,
Dieter
Anonymous
 

Re: Endgame bitbases

Postby diepeveen » 06 Jul 2005, 00:42

Dieter B?r?ner wrote:
diepeveen wrote:Who is going to rewrite Kadatch code to 64 bits?


Hi Vincent. I don't see a big problem here. The "real" Kadatch code is Huffman encoding/decoding on small blocks. No need to change anything here, when you go to 64 bits. There are small parts that do file I/O that migth need adjustments, depending on details of compiler and library implementation on the 64 bit environment. If the environment uses 64 bit longs, only little adjustment should be needed. If not, on might need some fseek64 (or equivilent) instead of fseek. Does not look too complicated to me.

Calling code must of course now handle tables of 64 bit offsets, instead of 32 bits offsets. But that is not really inside the Kadatch code, is it? Also, does not look too complicated to me, to change this. Actually it could make all of the Nalimov/Kadatch code significantly easier: No need anymore to split TBs into parts, and all that code, that has to handle this.

Regards,
Dieter


I didn't ask how difficult it is. I just asked who is going to do the EFFORT to rewrite it?

Also note you need to create 2 files, not 1. 1 file with kadatch indexation code with 64 bits lookuptables and 1 file like the current one with the compression code.

You don't want to waste of course 1 GB ram to indexation tables when loading all 6 men, nor do you want to lose 3 minutes starting time to just kadatch code.

"WHO is going to do that effort?", that's my question. Not whether it is *possible*, *hard*, or *difficult*.

if you ask me, the past few years, the only 3 real big problem of nalimov has been
1. the kadatch code which has been used more or less unmodified in his code
2. using DTM instead of WDL
3. using C++ instead of C


Vincent
diepeveen
 
Posts: 116
Joined: 28 Jun 2005, 01:09
Location: Netherlands

Re: Endgame bitbases

Postby Dann Corbit » 06 Jul 2005, 02:42

diepeveen wrote:
Dieter B?r?ner wrote:
diepeveen wrote:Who is going to rewrite Kadatch code to 64 bits?


Hi Vincent. I don't see a big problem here. The "real" Kadatch code is Huffman encoding/decoding on small blocks. No need to change anything here, when you go to 64 bits. There are small parts that do file I/O that migth need adjustments, depending on details of compiler and library implementation on the 64 bit environment. If the environment uses 64 bit longs, only little adjustment should be needed. If not, on might need some fseek64 (or equivilent) instead of fseek. Does not look too complicated to me.

Calling code must of course now handle tables of 64 bit offsets, instead of 32 bits offsets. But that is not really inside the Kadatch code, is it? Also, does not look too complicated to me, to change this. Actually it could make all of the Nalimov/Kadatch code significantly easier: No need anymore to split TBs into parts, and all that code, that has to handle this.

Regards,
Dieter


I didn't ask how difficult it is. I just asked who is going to do the EFFORT to rewrite it?

Also note you need to create 2 files, not 1. 1 file with kadatch indexation code with 64 bits lookuptables and 1 file like the current one with the compression code.

You don't want to waste of course 1 GB ram to indexation tables when loading all 6 men, nor do you want to lose 3 minutes starting time to just kadatch code.

"WHO is going to do that effort?", that's my question. Not whether it is *possible*, *hard*, or *difficult*.


Compression stuff is easy. The EGTB compression could easily be replaced by 7-zip libraries or bzip2 or whatever else someone whated to use.


if you ask me, the past few years, the only 3 real big problem of nalimov has been
1. the kadatch code which has been used more or less unmodified in his code
2. using DTM instead of WDL


That is what bitbases are for.

3. using C++ instead of C

Beowulf has a pure C translation in the code base (I forget who did it...) that can be linked using a C only compiler.


Vincent
Dann Corbit
 

Re: Endgame bitbases

Postby diepeveen » 06 Jul 2005, 02:53

Dann Corbit wrote:
diepeveen wrote:
Dieter B?r?ner wrote:
diepeveen wrote:Who is going to rewrite Kadatch code to 64 bits?


Hi Vincent. I don't see a big problem here. The "real" Kadatch code is Huffman encoding/decoding on small blocks. No need to change anything here, when you go to 64 bits. There are small parts that do file I/O that migth need adjustments, depending on details of compiler and library implementation on the 64 bit environment. If the environment uses 64 bit longs, only little adjustment should be needed. If not, on might need some fseek64 (or equivilent) instead of fseek. Does not look too complicated to me.

Calling code must of course now handle tables of 64 bit offsets, instead of 32 bits offsets. But that is not really inside the Kadatch code, is it? Also, does not look too complicated to me, to change this. Actually it could make all of the Nalimov/Kadatch code significantly easier: No need anymore to split TBs into parts, and all that code, that has to handle this.

Regards,
Dieter


I didn't ask how difficult it is. I just asked who is going to do the EFFORT to rewrite it?

Also note you need to create 2 files, not 1. 1 file with kadatch indexation code with 64 bits lookuptables and 1 file like the current one with the compression code.

You don't want to waste of course 1 GB ram to indexation tables when loading all 6 men, nor do you want to lose 3 minutes starting time to just kadatch code.

"WHO is going to do that effort?", that's my question. Not whether it is *possible*, *hard*, or *difficult*.


Compression stuff is easy. The EGTB compression could easily be replaced by 7-zip libraries or bzip2 or whatever else someone whated to use.


if you ask me, the past few years, the only 3 real big problem of nalimov has been
1. the kadatch code which has been used more or less unmodified in his code
2. using DTM instead of WDL


That is what bitbases are for.

3. using C++ instead of C

Beowulf has a pure C translation in the code base (I forget who did it...) that can be linked using a C only compiler.


Vincent


It's just who quotes it: "compression is easy".

Please write me a good compressor program. You can earn a LOT of money with it.

Note it's just as hard as writing a world top chess engine.

Also as you obviously do not know a thing about compression. 7-zip is using pretty advanced stuff that's not going to work for EGTB compression. Such as dictionary based compression.

That means you basically need to decode the entire EGTB in order to just get 1 byte.

Real good compression is REAL hard to write.

In fact, despite being busy quite a while searching for it, i didn't find a compressor that can spent really days to compress a single egtb, as long as it decompresses a tiny fragment real quickly.

Kadatch is really superior to what you can come up with in your own labatory. What he has been doing is *not* easy.

Note Kadatch also kicks the hell out of the default zip and bzip2 that the linux OS default ships.

Note that 7-zip isn't ported to linux yet. So if Nalimov would use 7-zip, not only would everyone cry because you need to un-7-zip the entire egtb, the linux freaks also badly need a port of it to linux (work on that gets done though).

p.s. (de)compressing a single EGTB with bzip2 is going to eat half a day, whereas Kadatch is doing it within a few minutes.

Vincent
diepeveen
 
Posts: 116
Joined: 28 Jun 2005, 01:09
Location: Netherlands

Re: Endgame bitbases

Postby Dann Corbit » 06 Jul 2005, 03:05

diepeveen wrote:
Dann Corbit wrote:
diepeveen wrote:
Dieter B?r?ner wrote:
diepeveen wrote:Who is going to rewrite Kadatch code to 64 bits?


Hi Vincent. I don't see a big problem here. The "real" Kadatch code is Huffman encoding/decoding on small blocks. No need to change anything here, when you go to 64 bits. There are small parts that do file I/O that migth need adjustments, depending on details of compiler and library implementation on the 64 bit environment. If the environment uses 64 bit longs, only little adjustment should be needed. If not, on might need some fseek64 (or equivilent) instead of fseek. Does not look too complicated to me.

Calling code must of course now handle tables of 64 bit offsets, instead of 32 bits offsets. But that is not really inside the Kadatch code, is it? Also, does not look too complicated to me, to change this. Actually it could make all of the Nalimov/Kadatch code significantly easier: No need anymore to split TBs into parts, and all that code, that has to handle this.

Regards,
Dieter


I didn't ask how difficult it is. I just asked who is going to do the EFFORT to rewrite it?

Also note you need to create 2 files, not 1. 1 file with kadatch indexation code with 64 bits lookuptables and 1 file like the current one with the compression code.

You don't want to waste of course 1 GB ram to indexation tables when loading all 6 men, nor do you want to lose 3 minutes starting time to just kadatch code.

"WHO is going to do that effort?", that's my question. Not whether it is *possible*, *hard*, or *difficult*.


Compression stuff is easy. The EGTB compression could easily be replaced by 7-zip libraries or bzip2 or whatever else someone whated to use.


if you ask me, the past few years, the only 3 real big problem of nalimov has been
1. the kadatch code which has been used more or less unmodified in his code
2. using DTM instead of WDL


That is what bitbases are for.

3. using C++ instead of C

Beowulf has a pure C translation in the code base (I forget who did it...) that can be linked using a C only compiler.


Vincent


It's just who quotes it: "compression is easy".

Please write me a good compression algorithm. You can earn a LOT of money with it.

Note it's just as hard as writing a world top chess engine.

Also as you obviously do not know a thing about compression. 7-zip is using pretty advanced stuff that's not going to work for EGTB compression. Such as dictionary based compression.

That means you basically need to decode the entire EGTB in order to just get 1 byte.

Just write it in any page size you like. You can have chunks of any multiple of the size of a record before compression.
Real good compression is REAL hard to write.

No argument there.
In fact, despite being busy quite a while searching for it, i didn't find a compressor that can spent really days to compress a single egtb, as long as it decompresses a tiny fragment real quickly.
Arithmetic compression is probably optimal then.
Why not compress the entire file using arithmetic compression of order 5 in chunks of 1024 records or so.
Then, memory map the entire compressed files to memory.
Then, decompress the hunks on the fly as you need them.
Any compression scheme will not get more than a few percent better than arithmetic compression. Takes a very long time to squeeze it, but decompression is fairly fast. The idea of memory mapping it is that if you have a few terrabytes of memory, then you can access the data at memory speed.

Kadatch is really superior to what you can come up with in your own labatory. What he has been doing is *not* easy.

Note Kadatch also kicks the hell out of the default zip and bzip2 that the linux OS default ships.

I am curious to know what percentage smaller it is for 1 GB of raw file into 1 GB of compressed file.
Note that 7-zip isn't ported to linux yet. So if Nalimov would use 7-zip, not only would everyone cry because you need to un-7-zip the entire egtb, the linux freaks also badly need a port of it to linux (work on that gets done though).


7-zip is source code -- mostly cobbled together from many other open sources -- and almost all of the algorithms would port easily in a few days to Unix platforms. Certainly, the most important ones [perhaps top 10] (best compression is wanted here) could be ported for experiments without too much trouble.
Dann Corbit
 

Re: Endgame bitbases

Postby diepeveen » 06 Jul 2005, 03:46

Dann Corbit wrote:
diepeveen wrote:
Dann Corbit wrote:
diepeveen wrote:
Dieter B?r?ner wrote:
diepeveen wrote:Who is going to rewrite Kadatch code to 64 bits?


Hi Vincent. I don't see a big problem here. The "real" Kadatch code is Huffman encoding/decoding on small blocks. No need to change anything here, when you go to 64 bits. There are small parts that do file I/O that migth need adjustments, depending on details of compiler and library implementation on the 64 bit environment. If the environment uses 64 bit longs, only little adjustment should be needed. If not, on might need some fseek64 (or equivilent) instead of fseek. Does not look too complicated to me.

Calling code must of course now handle tables of 64 bit offsets, instead of 32 bits offsets. But that is not really inside the Kadatch code, is it? Also, does not look too complicated to me, to change this. Actually it could make all of the Nalimov/Kadatch code significantly easier: No need anymore to split TBs into parts, and all that code, that has to handle this.

Regards,
Dieter


I didn't ask how difficult it is. I just asked who is going to do the EFFORT to rewrite it?

Also note you need to create 2 files, not 1. 1 file with kadatch indexation code with 64 bits lookuptables and 1 file like the current one with the compression code.

You don't want to waste of course 1 GB ram to indexation tables when loading all 6 men, nor do you want to lose 3 minutes starting time to just kadatch code.

"WHO is going to do that effort?", that's my question. Not whether it is *possible*, *hard*, or *difficult*.


Compression stuff is easy. The EGTB compression could easily be replaced by 7-zip libraries or bzip2 or whatever else someone whated to use.


if you ask me, the past few years, the only 3 real big problem of nalimov has been
1. the kadatch code which has been used more or less unmodified in his code
2. using DTM instead of WDL


That is what bitbases are for.

3. using C++ instead of C

Beowulf has a pure C translation in the code base (I forget who did it...) that can be linked using a C only compiler.


Vincent


It's just who quotes it: "compression is easy".

Please write me a good compression algorithm. You can earn a LOT of money with it.

Note it's just as hard as writing a world top chess engine.

Also as you obviously do not know a thing about compression. 7-zip is using pretty advanced stuff that's not going to work for EGTB compression. Such as dictionary based compression.

That means you basically need to decode the entire EGTB in order to just get 1 byte.

Just write it in any page size you like. You can have chunks of any multiple of the size of a record before compression.
Real good compression is REAL hard to write.

No argument there.
In fact, despite being busy quite a while searching for it, i didn't find a compressor that can spent really days to compress a single egtb, as long as it decompresses a tiny fragment real quickly.
Arithmetic compression is probably optimal then.
Why not compress the entire file using arithmetic compression of order 5 in chunks of 1024 records or so.
Then, memory map the entire compressed files to memory.
Then, decompress the hunks on the fly as you need them.
Any compression scheme will not get more than a few percent better than arithmetic compression. Takes a very long time to squeeze it, but decompression is fairly fast. The idea of memory mapping it is that if you have a few terrabytes of memory, then you can access the data at memory speed.

Kadatch is really superior to what you can come up with in your own labatory. What he has been doing is *not* easy.

Note Kadatch also kicks the hell out of the default zip and bzip2 that the linux OS default ships.

I am curious to know what percentage smaller it is for 1 GB of raw file into 1 GB of compressed file.
Note that 7-zip isn't ported to linux yet. So if Nalimov would use 7-zip, not only would everyone cry because you need to un-7-zip the entire egtb, the linux freaks also badly need a port of it to linux (work on that gets done though).


7-zip is source code -- mostly cobbled together from many other open sources -- and almost all of the algorithms would port easily in a few days to Unix platforms. Certainly, the most important ones [perhaps top 10] (best compression is wanted here) could be ported for experiments without too much trouble.


Again Dann,

Who stops you from serving the world for once in a useful way?
diepeveen
 
Posts: 116
Joined: 28 Jun 2005, 01:09
Location: Netherlands

Re: Endgame bitbases

Postby Dann Corbit » 06 Jul 2005, 03:48

Again Dann,

Who stops you from serving the world for once in a useful way?


Mostly myself, I imagine.
Dann Corbit
 

7-Zip has been ported to Linux

Postby Dann Corbit » 06 Jul 2005, 04:05

Dann Corbit
 

A compression experiment

Postby Dann Corbit » 06 Jul 2005, 04:53

Directory of E:\programme\winboard\Nalimov\save

The decompressed tablebase file:
07/05/2005 08:01 PM 286,777,440 krpkq.nbw

The same file compressed with Andrew's compressor:
06/05/2003 12:00 AM 129,050,046 krpkq.nbw.emd
45% of original size

The same file compressed with 7-zip:
07/05/2005 08:30 PM 95,326,434 krpkq.7z
33% of original size

The same file compressed with a simple arithmetic compressor:
07/05/2005 08:41 PM 156,127,948 krpkq.nbw.ari
54% of original size
Dann Corbit
 

Re: Endgame bitbases

Postby Dann Corbit » 06 Jul 2005, 18:13

Additional experiments:

Decompression wall time for entire file from compressed to decompressed:
Naive Arithmetic routine: 2:46
7-Zip: 2:08
Andrew K: 1:49

Because the content of the tablebase files does not change, it should be possible to preprocess them, and then use fairly optimal static huffman routines.
Dann Corbit
 

Re: Endgame bitbases

Postby Dann Corbit » 06 Jul 2005, 19:09

This thing:
http://www.oberhumer.com/opensource/lzo ... .01.tar.gz
gets 59% compression and decompresses at 206 MB/sec.

Nothing else is even in the neighborhood.
Dann Corbit
 

Re: Endgame bitbases

Postby Gian-Carlo Pascutto » 08 Jul 2005, 07:06

1) lzo is GPL, unusable for 99% of the programs

2) The Kadatch compressors can do with limited context. All the general purpose compressors you quoted can only do so with a severe penality in compression ratio.
Gian-Carlo Pascutto
 
Posts: 42
Joined: 04 Jul 2005, 13:24
Location: Belgium

Re: Endgame bitbases

Postby Robert Allgeuer » 09 Jul 2005, 10:34

Gian-Carlo Pascutto wrote:2) The Kadatch compressors can do with limited context. All the general purpose compressors you quoted can only do so with a severe penality in compression ratio.


What does it mean "limited context"?

Thanks
Robert
Robert Allgeuer
 
Posts: 124
Joined: 28 Sep 2004, 19:09
Location: Konz / Germany

Re: Endgame bitbases

Postby Gian-Carlo Pascutto » 09 Jul 2005, 17:39

That you are able to decompress small parts of the file in isolation.
Gian-Carlo Pascutto
 
Posts: 42
Joined: 04 Jul 2005, 13:24
Location: Belgium

Re: Endgame bitbases

Postby Dann Corbit » 11 Jul 2005, 18:21

Kadtach just compresses/decompresses in 64K blocks. I am guessing that the other programs will not have a big penalty to do the same thing.

I think that space consumed will become a lot less important over time.
Here (for instance) is a collection of drive systems over 1TB:
http://www.lacie.com/products/family.htm?id=10007

Now, I do agree that both what Eugene has done and what Andrew Kadtach has done are both incredibly clever solutions to the tablebase problem. But I think alterative solutions would be equally good if the same amount of effort went into them that went into the existing system.

It would be nice if a Berkeley style licenced EGTB system should arise. There seems to be a lot of difficulty for Eugene to handle all of the requests because I know personally of people who have waited for months to get permission to use his tablebase file system.

The Ernst Heinz papers had a parallel development to what Eugene did and with about the same compression.
Dann Corbit
 

Re: Endgame bitbases

Postby Anonymous » 11 Jul 2005, 19:14

Dann Corbit wrote:Kadtach just compresses/decompresses in 64K blocks. I am guessing that the other programs will not have a big penalty to do the same thing.

I think that space consumed will become a lot less important over time.
Here (for instance) is a collection of drive systems over 1TB:
http://www.lacie.com/products/family.htm?id=10007

Now, I do agree that both what Eugene has done and what Andrew Kadtach has done are both incredibly clever solutions to the tablebase problem. But I think alterative solutions would be equally good if the same amount of effort went into them that went into the existing system.

It would be nice if a Berkeley style licenced EGTB system should arise. There seems to be a lot of difficulty for Eugene to handle all of the requests because I know personally of people who have waited for months to get permission to use his tablebase file system.

The Ernst Heinz papers had a parallel development to what Eugene did and with about the same compression.


Dann, in which paper did Heinz describe compression?

IIRC, Kadatch/Nalimov compress 8 kB blocks by default (it is adjustable in the source code to other powers of 2). Would be interested to know, how other algorithms do with 8 kB blocks of TB or bitbase data. My guess is, that the Kadatch algorithm is extremely competitive here, both in speed and compression ratio.

Nalimov TBs are very condense in index space used; meaning it has relatively few broken positions. But this number will rise with more men on the board. Then it might be worthwhile to think about the idea I mentioned, of having an algorithm that takes advantage of the fact, that it won't matter to what result those broken positions expand.

For sheikhs, lottery winners, ... :

http://www.emc.com/news/photo_library/n ... 300dpi.jpg
and http://www.emc.com/products/systems/sym ... MX3000.pdf

Up to 149 TB, 128 GB cache. Should be enough for all currently available TBs and the ones that will be generated in the next couple of years - independant of compression ratio.

Regards,
Dieter
Anonymous
 

Re: Endgame bitbases

Postby mridul » 11 Jul 2005, 20:19

For sheikhs, lottery winners, ... :


Hehe , good one , totally unexpected joke :)

- Mridul
mridul
 
Posts: 48
Joined: 09 Dec 2004, 11:34
Location: Bangalore , India

Re: Endgame bitbases

Postby Dann Corbit » 11 Jul 2005, 23:50

This is a paper on tablebase compression that looks interesting:
http://www.daimi.au.dk/~doktoren/master_thesis/

As far as the compression by Heinz -- I misremembered. The compaction of the interenal representation of a position was what was in the paper, rather than how to squish the compact notation even smaller to disk.

The FEG format is 20% smaller than Nalimov. Perhaps, someday, Mr. de Koenig will describe it.
Dann Corbit
 

Re: Endgame bitbases

Postby Alessandro Scotti » 15 Jul 2005, 13:55

In the current Kiwi, bitbases are loaded from disk, decompressed and decoded. This last step improves compression but is pretty slow and will make the startup time much longer. However, it's really useless to decode all bitbases at startup, and since the encoding used does not require context, I now think it is better to have this logic in the probe code.
Then if one really wants to waste some time at startup, compressing the bitbases with PAQ 6 will make them a lot smaller than zip compression. Some examples:

Code: Select all
kbnk_wtm.bb = 3472 bytes
kbnk_btm.bb = 10781 bytes
kppk_wtm.bb = 7301 (9400 zipped)
kppk_btm.bb = 23789 bytes (40659 zipped)


All of these are with the default settings, with decent speed. For comparison, kppk_btm.bb compresses to only 20919 bytes using 7th order modeling, this is almost half of the zip size (but it's very slow and requires a lot of RAM).
If one includes only bitbases for the stronger side, the entire 4-pieces set would probably take very little space even with zip.
User avatar
Alessandro Scotti
 
Posts: 306
Joined: 20 Nov 2004, 00:10
Location: Rome, Italy

Re: Endgame bitbases

Postby Fabien Letouzey » 16 Jul 2005, 08:51

Hi Dann,
Dann Corbit wrote:This is a paper on tablebase compression that looks interesting:
http://www.daimi.au.dk/~doktoren/master_thesis/

Indeed, but could you send a compressed version to me? It must fit in a floppy disk ...

Thanks,

Fabien.
Fabien Letouzey
 
Posts: 110
Joined: 03 Dec 2004, 10:17
Location: France

PreviousNext

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 4 guests