Page 1 of 1

Compressing bitbases

PostPosted: 06 Mar 2007, 16:51
by Peter Fendrich
I started today to think about making more bitbases for Alaric (I have only kpk currently). One issue of cource is to make them as small as possible. Sofar I thougth of:
    Mirror the position so it is always from one pow when assymetric (kpk means that kkp is not necessary but for kpkp that doesnt help much)
    Mirror the position using half of the board when possible
    Using zip technology (will probably not shrink it much)
    Using the magic number method. I don't know how yet!
I suppose that different patterns offer different possiblilities here, such as kpk, kbpk and krpkr might offer different possiblities.

What can be done to further compress? Any ideas? Are there any documents about it?

Edit: I forgot to mention, Why not make an open source project out of this?
/Peter

Re: Compressing bitbases

PostPosted: 06 Mar 2007, 17:40
by Ron Murawski
Hi Peter,

The best place to go for compression info is datacompression.info

Main site (with links to papers):
http://datacompression.info/

Here's a link to compression source code:
http://datacompression.info/SourceCode.shtml

Here's a Wikipedia entry that has some overview info:
http://en.wikipedia.org/wiki/Data_compression

Ron

Re: Compressing bitbases

PostPosted: 06 Mar 2007, 19:49
by Peter Fendrich
Thank you Ron,
as I said in the first message, I don't think that general compression techniques will reduce size much for bitbases. I might be wrong though...

Re: Compressing bitbases

PostPosted: 07 Mar 2007, 00:39
by Dann Corbit
The compression methods with greatest compression will:
1. Use TONS of ram (e.g 500 MB to compress/decompress)
2. Take ages to decompress.

I guess you will want an LZ based routine, which will not get maximum compression (a few percent less than others) but will decompress like the wind and it won't break the bank for RAM.

A good example is the 7z stuff, which is also available in a library/sdk:
http://www.7-zip.org/download.html

With profile guided optimization it decompresses at 20 MB/sec on 2.2 GHz AMD.
That will be hard to beat.

Re: Compressing bitbases

PostPosted: 07 Mar 2007, 00:44
by Dann Corbit
PAQ is the 'tightest squeezer' :
http://cs.fit.edu/~mmahoney/compression/

but it has lots of drawbacks.

Re: Compressing bitbases

PostPosted: 07 Mar 2007, 00:58
by Dann Corbit
Here is a benchmark I ran previously:

I have found that after profile guided optimization, the LZMA SDK will decompress at 20 MB/sec on 2.2 GHz AMD. It looks like a good candidate for a compressor/decompressor pair because it gets a good compression ratio and it decompresses extremely fast.

LZMA 4.43 Copyright (c) 1999-2006 Igor Pavlov 2006-06-04

Compressing Decompressing

1409 KB/s 1659 MIPS 20139 KB/s 2030 MIPS
1405 KB/s 1654 MIPS 20024 KB/s 2018 MIPS
1412 KB/s 1663 MIPS 20637 KB/s 2080 MIPS
1414 KB/s 1666 MIPS 20106 KB/s 2026 MIPS
1407 KB/s 1657 MIPS 20396 KB/s 2056 MIPS
1405 KB/s 1655 MIPS 20134 KB/s 2029 MIPS
1414 KB/s 1665 MIPS 20201 KB/s 2036 MIPS
1409 KB/s 1660 MIPS 20543 KB/s 2071 MIPS
1413 KB/s 1664 MIPS 20096 KB/s 2025 MIPS
1414 KB/s 1665 MIPS 20084 KB/s 2024 MIPS
---------------------------------------------------
1410 KB/s 1661 MIPS 20234 KB/s 2039 MIPS Average

Re: Compressing bitbases

PostPosted: 07 Mar 2007, 01:01
by Peter Fendrich
Ok, I probably expressed myself unclear...

What I am after is a design of bitbases and tricks to make them as small as possible. I want to be able to keep as much as possible of them in RAM during search.

I suppose that the compression techniques, such as 7z, will give data that is very slow to access during run-time. Right?

/Peter

Re: Compressing bitbases

PostPosted: 07 Mar 2007, 01:53
by Dann Corbit
Here is a comparison using PAQ, which took 97.5 seconds to compress 1/2 MB and required 286 MB ram.

C:\paq>paq8jdsse paq8hp8.exe
Creating archive paq8hp8.exe.paq8jd with 1 file(s)...
paq8hp8.exe 581632 -> 145171
581632 -> 145204
Time 97.51 sec, used 286174868 bytes of memory

Decompression was just as painful:
C:\paq>paq8jdsse -d paq8hp8.exe.paq8jd
Extracting 1 file(s) from paq8hp8.exe.paq8jd -5
Comparing ./paq8hp8.exe 581632 -> identical
Time 97.24 sec, used 286174901 bytes of memory

Re: Compressing bitbases

PostPosted: 07 Mar 2007, 02:13
by Dann Corbit
For sure software compression algorithms will bring some benefit to your idea, no matter what.

For instance, if we compress them, then read + decompress will be faster than reading the decompressed files.

Also, if we can decompress 20 MB/sec, then a 1 MB file will take 1/20th second to decompress. If we store the file in pages, then we can decompress just parts of it.

I think it would be especially valuable for 6 & 7 man bitbase files.
There is no way to hold a complete set of 7 man bitbase files in RAM and not be sorry you did it, but we can decompress individual files or segments of files.

Re: Compressing bitbases

PostPosted: 07 Mar 2007, 16:38
by Peter Fendrich
Thank you Dann for your very useful information. I am a complete illiterate in this area.
As you say 6- and 7-men have to be outside RAM, at least in this decade...
Read+decompress sounds neat especially if it can be divided into parts.

What about holding compressed data in memory and access it on the fly? Do you know if it is possible to decompress while you are accessing it?

Re: Compressing bitbases

PostPosted: 07 Mar 2007, 21:09
by Dann Corbit
Peter Fendrich wrote:Thank you Dann for your very useful information. I am a complete illiterate in this area.
As you say 6- and 7-men have to be outside RAM, at least in this decade...
Read+decompress sounds neat especially if it can be divided into parts.

What about holding compressed data in memory and access it on the fly? Do you know if it is possible to decompress while you are accessing it?


If the bitbases are small enough, I would recommend keeping the whole thing in memory without decompression. If you decompress in pages it could certainly be done on the fly.

For this sort of approach, I would recommend an LRU cache that decompresses pages upon read. We might (for instance) set up a 256 MB bitbase cache. Since we don't have to worry about dirty reads, it could be done with no disk flushing at all and would run really fast. Because of the reduced disk storage, it might even be faster than just reading the information from disk, especially if we can't hold the whole set in RAM. But the added complication may not be worth the bother for 5 man files. When we get to 6 & 7 men files I think it will be a big benefit. Even if we could hold the whole thing in RAM -- would we want to, because the same memory could be used for transposition tables and other memory hungry items.

Re: Compressing bitbases

PostPosted: 12 Mar 2007, 23:24
by Peter Fendrich
Ok,
I will start to produce a couple of bitbases later on when I have fixed some severe drawbacks in Alaric. Probably after 2 weeks.
Then we could make some tests to get some understanding of how to best arrange bitbases in files and memory.

I intend to make it open source and anyone interested can participate.

/Peter

Re: Compressing bitbases

PostPosted: 14 Mar 2007, 21:15
by Fritz Grau
What about partial bitbases? Chess program contain a fast search, so it would suffice to know the results for a sufficiently dense subset of the positions. For example, the subset could be chosen as the positions satisfying "HashKey AND 255 = 0", while we could use "(HashKey SHR 8) MOD BitBaseSize" as an indexing function.

Advantages:
- easy indexing
- sparse access to 6-men and 7-men bitbases does not cost much memory or disk space

Disadvantages:
- different engines use different hash keys
- this kind of partial bitbases will probably be incompressible
- no perfect information

Re: Compressing bitbases

PostPosted: 14 Mar 2007, 21:51
by Peter Fendrich
I would prefer a practical approach. All bitbases are not needed. Engines know how to handle KQK for instance.
Your suggestion, if I understand it right, is to more or less randomly exclude some of the bitbases. Well, it doesn't appeal to me really but it should be rather straightforward to adjust the code to do that.
Why are they uncompressable?
/Peter

Re: Compressing bitbases

PostPosted: 15 Mar 2007, 13:42
by Tony van Roon-Werten
Fritz Grau wrote:What about partial bitbases? Chess program contain a fast search, so it would suffice to know the results for a sufficiently dense subset of the positions. For example, the subset could be chosen as the positions satisfying "HashKey AND 255 = 0", while we could use "(HashKey SHR 8) MOD BitBaseSize" as an indexing function.

Advantages:
- easy indexing
- sparse access to 6-men and 7-men bitbases does not cost much memory or disk space

Disadvantages:
- different engines use different hash keys
- this kind of partial bitbases will probably be incompressible
- no perfect information


Unfortunately it wont work.

Say current position is not recognized. (change of 254 out of 255) and I have 40 moves.

If this is a win then there is a 40/255 change I will get a hit (<16%) and find a position where you loose (and we stop the search) in the remaining 84% we continue.

If this position is a loss, all my moves have to lead to a win for you. To find these, the chance is (1/255)^40 (ie pretty small)

iow, the search will continue as if you didn't have bb's.

My experience is that with a lossy compression, you need an recognition percentage of at least 75% to make the tree end.

Tony

Re: Compressing bitbases

PostPosted: 15 Mar 2007, 22:27
by Fritz Grau
Peter Fendrich wrote:Why are they uncompressable?

I was imprecise; what I should have said is: They should be uncompressable, if the results are equally distributed, as the random choice of hash keys scatters the results. Otherwise, the maximum compression ratio should be achieved by arithmetic encoding, and the compression ratio would only depend on the probabilities of outcomes.

Tony van Roon-Werten wrote:My experience is that with a lossy compression, you need an recognition percentage of at least 75% to make the tree end.


My engine uses hash information for move ordering, and once the hash table gives a positive mate score, no further search is done. This tends to propagation of mate scores to predecessor positions in the hash table.

However, unlike yourself, I have not made any experiments with lossy compression yet. So I am ready to believe that this idea won't work in practice.

Re: Compressing bitbases

PostPosted: 16 Mar 2007, 08:38
by Tony van Roon-Werten
Fritz Grau wrote:
...
My engine uses hash information for move ordering, and once the hash table gives a positive mate score, no further search is done. This tends to propagation of mate scores to predecessor positions in the hash table.
.


Yes, but would it still happen if you replaced the positive matescore with a normal evaluation every 254 out of 255 times ?

Fritz Grau wrote:
However, unlike yourself, I have not made any experiments with lossy compression yet. So I am ready to believe that this idea won't work in practice.


Well, it's only this version if your idea that won't work. Try making a new version :) It's how progress is made.

[Hmm, philisophical moment]
OTOH, making new versions rather than admitting the basic idea is flawed is called stubborn :?

I guess there's a thin line....

Tony