Daniel Shawul wrote:Wow!That is one big collection.
Ok I will tell you what i need roughly.
From your database, you can generate start positions and classify them
based on their ECO codes A,B,C,D,E. Then many samples of same size ,say 150 positions,
are taken with some assumed distribution wrt ECO codes. I would guess normal or something
slightly skewed to the left. Each distribution should have use that same distribution so that
same result is expected which ever one I pick. Later on when i add up the results from the different
samples, I expect the result to quickly converge into a normal distribution according to CLT.
So in short if Sample 1 has 20A's, 30B's 25C's then Sample 2 should have that much.
You can use longer book lines to increase variation but i wouldn't expect that to be a problem
for you with that much games
Thanks
Daniel
Here is the problem:
I have tens of thousands of fully analyzed and statistically classified EPD strings.
However, I do not know the ECO codes for the strings, and I do not have any utility that will classify them for me.
I can input them into a SCID database but it classified every position as A00a and that is not correct.
PGN-EXTRACT by Barnes also does not know how to deal with them.
For instance, here are 1344 carefully analyzed Epd records:
http://cap.connx.com/epd/analyzed.epd.bz2they are in a format like the following:
rnbqkbnr/pp1p1ppp/4p3/2p5/2P5/2N2N2/PP1PPPPP/R1BQKB1R b KQkq - acd 22; bm Nf6; ce -8; pm Nf6 {649} Nc6 {192} b6 {159} a6 {116} d5 {89}; pv Nf6 e3 Nc6 Be2 d5 cxd5 exd5 d4 cxd4 exd4 Bd6 O-O O-O Qb3 Na5 Qc2 Nc6 Be3 Nb4 Qb3 Bf5 Ne5 Bc2 Qa3 Nd3;
Here is a breakdown of the components of this position:
The base position:
rnbqkbnr/pp1p1ppp/4p3/2p5/2P5/2N2N2/PP1PPPPP/R1BQKB1R b KQkq -
The depth in plies analyzed:
acd 22
The move chosen by computer analysis:
bm Nf6
The computer analysis in centipawns for this position:
ce -8
The statistical counts for actual moves played by strong players for this position, with frequencies in curly braces:
pm Nf6 {649} Nc6 {192} b6 {159} a6 {116} d5 {89}
In this case, Nf6 was played 649 times, Nc6 was played 192 times, b6 was played 159 times, a6 was played 116 times, and d5 was played 89 times. The computer best move coincides with the most frequently played move in this instance.
This the the computer generated pv for this position:
pv Nf6 e3 Nc6 Be2 d5 cxd5 exd5 d4 cxd4 exd4 Bd6 O-O O-O Qb3 Na5 Qc2 Nc6 Be3 Nb4 Qb3 Bf5 Ne5 Bc2 Qa3 Nd3
We could manually enter the positions one at a time into SCID and find out what ECO codes they correspond to by doing a "SET Startup position" followed by examination of the ECO code for the game retrived, but I am far too lazy for that.