ELOStat algorithm ?

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Re: ELOStat algorithm ?

Postby Dann Corbit » 05 Jan 2005, 01:20

Maybe this is helpful (keyword search turned it up):
http://billharlan.com/pub/code/conjugate_gradients/

This package has eivenvalues/eigenvectors:
http://ltilib.sourceforge.net/doc/homepage/index.shtml

and so does this one:
http://vxl.sourceforge.net/

Here is an interesting paper on eigendecomposition:
http://www.dsp.ufl.edu/~twong/Preprints/fastrls.pdf
Dann Corbit
 

Re: ELOStat algorithm ?

Postby Anonymous » 05 Jan 2005, 15:52

R?mi, your work looks very intersesting. BTW. years back, when I worked on fitting problems, I used Matrix inversion to calculate covariance Matrix. But I forgot most of the details. I remember that some fitting problems were very demanding (in a numerical sense) and easy routines (like the rgaussi Dann mentioned, which was actually written by me - more or less as a toy in a contest like situation) would not work well.

For Eigenvalue problems, I suggest:

Wilkinson, J.H., Reinsch, C.: Linear Algebra, Handbook for Automatic Computation II, Springer, Berlin-Heidelberg-New York 1971

I hope, I remember this correctly. I just googled vor Wilkinson. It will be available in Universtity libraries. It does sound old, but it is very good. The example routines are in Algol. It is very easy to read, even if you never have seen Algol (which I didn't at the time, I used the book). The rountines in NR looked like translations of the mentioned book (with one automatic translation phase from Fortran to C). I personally found the Wilkinson book much more helpful. I translated their Algol code to C, and it worked very well. I had written the code for complex problems, forgot if they had some symmetry. I'd give you the code (if I find it again after my HD crash), but I fear it would not really fit your problem. Also, finding *all* Eigenvalues/Eigenvectors of a huge matrix might be very difficult. Finding only Eigenvalues, or even only a few of the biggest ones will be much easier.

While googling for Wilkinson, I also found:

Wilkinson, J.H.: The Algebraic Eigenvalue Problem, London, Oxford University Press 1965

So, while you are at the Library, you might to have a look.

Regards,
Dieter
Anonymous
 

Re: ELOStat algorithm ?

Postby Rémi Coulom » 05 Jan 2005, 17:16

Dieter B?r?ner wrote:R?mi, your work looks very intersesting. BTW. years back, when I worked on fitting problems, I used Matrix inversion to calculate covariance Matrix. But I forgot most of the details. I remember that some fitting problems were very demanding (in a numerical sense) and easy routines (like the rgaussi Dann mentioned, which was actually written by me - more or less as a toy in a contest like situation) would not work well.
Thanks Dieter. You are right: matrix inversion is needed, not diagonalization. In fact, it is explained in Hunter's paper:
David Hunter wrote:Obtaining standard error estimates for parameters in a Bradley-Terry model is easy in principle; the inverse of the Hessian matrix of the log-likelihood (or, alternatively, the inverse of the Fisher information matrix) evaluated at the MLE gives an asymptotic approximation to the covariance matrix of the MLE.
He discusses this more in the end of the paper, if you are interested.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Dann Corbit » 05 Jan 2005, 19:46

I am curious about the problem that needs the eigenvalues/eigenvectors.

Are you fitting a curve (It seems vaguely to me like that is what you are solving from what I have read in your post and Dieter's follow-ups)? Why not use the Marquardt-Levenberg algorithm? The implementation in GnuPlot has been changed to public domain, BTW.

Maybe I just don't understand the issues at hand.
Dann Corbit
 

Re: ELOStat algorithm ?

Postby Rémi Coulom » 05 Jan 2005, 20:14

Dann Corbit wrote:I am curious about the problem that needs the eigenvalues/eigenvectors.

They are not needed. I had guessed wrong that they would be necessary.

Are you fitting a curve (It seems vaguely to me like that is what you are solving from what I have read in your post and Dieter's follow-ups)? Why not use the Marquardt-Levenberg algorithm? The implementation in GnuPlot has been changed to public domain, BTW.

I am indeed fitting the Bradley-Terry model. The maximization of the likelihood could be done with any gradient-ascent algorithm, such as conjugate-gradient and Levenberg-Marquardt, as you suggest. But there is a much better approach called "Minorization-Maximization" that works well and that I have already implemented. The problem is now to estimate confidence intervals. That's why I need the inverse of the Hessian of the log-likelihood.

Maybe I just don't understand the issues at hand.

If you'd like to understand, the whole theory is very well explained in Hunter's paper (the link is at the bottom of my web page). The paper may look a little intimidating at first, but its most important parts are understandable by anyone who knows what is a probability and a logarithm, I think (and knows how to solve a second-order polynomial). In order to really understand the article, it is necessary to make a few calculations on paper, but they are not difficult.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Dann Corbit » 06 Jan 2005, 02:03

R?mi Coulom wrote:
Dann Corbit wrote:I am curious about the problem that needs the eigenvalues/eigenvectors.

They are not needed. I had guessed wrong that they would be necessary.

Are you fitting a curve (It seems vaguely to me like that is what you are solving from what I have read in your post and Dieter's follow-ups)? Why not use the Marquardt-Levenberg algorithm? The implementation in GnuPlot has been changed to public domain, BTW.

I am indeed fitting the Bradley-Terry model. The maximization of the likelihood could be done with any gradient-ascent algorithm, such as conjugate-gradient and Levenberg-Marquardt, as you suggest. But there is a much better approach called "Minorization-Maximization" that works well and that I have already implemented. The problem is now to estimate confidence intervals. That's why I need the inverse of the Hessian of the log-likelihood.

Maybe I just don't understand the issues at hand.

If you'd like to understand, the whole theory is very well explained in Hunter's paper (the link is at the bottom of my web page). The paper may look a little intimidating at first, but its most important parts are understandable by anyone who knows what is a probability and a logarithm, I think (and knows how to solve a second-order polynomial). In order to really understand the article, it is necessary to make a few calculations on paper, but they are not difficult.

R?mi


Actually, the confidence and prediction intervals are not difficult to produce if you can produce an inverse to the function you are fitting.

For example, if I am fitting to a linear function, you just use the standard sum of squares, differences of squares, etc. found in all the elementary text books. For instance, like this:
http://ewr.cee.vt.edu/environmental/tea ... erval.html

If I am fitting an exponential function, then I can logarithmically compress all the y values. This produces a linear function. Then, I can run my formulae against the compressed data, and then perform an exponential expansion of the resulting intervals. (This technique is mathematically, valid -- I had long discussions with a statistics PhD named Judith Zeh to convince myself of it).

Similarly, for sqrt(x) you use x^2 and vice-versa.

In general, for any function you use its inverse.

There is (of course) a problem if the function is not one-to-one and onto (and hence has no inverse over the entire range).

If the function you are fitting is invertible, I have a Maple function generator that will automatically produce an inverse, if we have one or more derivatives (or if Maple can compute derivatives).
Dann Corbit
 

Re: ELOStat algorithm ?

Postby Dann Corbit » 06 Jan 2005, 02:12

Here is a logarithmic curve fit with prediction and confidence intervals that I wrote 20 years ago (so forgive the mess -- I've learned how to program since then):

$nofloatcalls
$PAGESIZE:50
$STORAGE:2
$DEBUG
SUBROUTINE GLLOG(NUMSET,NPTS)
C
C FIND LEAST SQUARES LINE FIT
C
IMPLICIT REAL*8 (A-Z)
INTEGER COUNT,CNT
REAL*4 CHISQR,TOL,NEXTX,CONST,LASTX,Y
REAL*8 VECTOR(101,2),CHITAB(6)
C REAL*8 VECTOR(1001,2),CHITAB(6)
INTEGER*2 GGPCT,VPLINE,VSLTYP,VSLCOL,XM,YM,H,LINLEN(6,5),L
INTEGER*4 ERR,NMIN2
integer*2 linout(200),IND1,IND2,IND3,IND4,IND5,CLR
INTEGER*2 I,POINTS,NPTS(6),J,NUMSET,ITYPE3,K,VECOUT(200,6,5),X0,Y0
$INCLUDE:'ENVD.BLK'
$INCLUDE:'ENVRS.BLK'
$INCLUDE:'LNEEDS.BLK'
COMMON /XG04/ VECTOR
COMMON /XG05/ VECOUT
COMMON /XG63/ I,POINTS,SSX,SSXY,SUMX,SUMXSQ,SUMXY,SUMY,SUMYSQ,
. XBAR,YBAR
IF (RXMIN .EQ. 0) RXMIN=1.0E-25
X0 = GGPCT(BLX,0) + IXMIN
Y0 = GGPCT(BLY,1) + IYMIN
XM = GGPCT(BUX,0) + IXMIN
YM = GGPCT(BUY,1) + IYMIN
CALL STRFIL(VECTOR,1,1616,0)
DO 15 I=1,NUMSET
DO 20 J=1,NPTS(I)
IF (NPTS(I).LE.2) THEN
WRITE(*,*)'INSUFFICIENT DATA FOR REGRESSION'
R E T U R N
ENDIF
VECTOR(1,1)=DBLE(FLOAT(NPTS(I)))
IF (RXMIN .LE. 0) THEN
WRITE(*,*)' THIS DATA SET HAS NON-POSITIVE X VALUES.'
WRITE(*,*)' THIS CURVE FITTING ALGORITHM REQUIORS X > 0.'
R E T U R N
ENDIF
VECTOR(J+1,1)=DLOG(DBLE(RHBUFF(J,I)))
VECTOR(J+1,2)=DBLE(RVBUFF(J,I))
20 CONTINUE
POINTS=IDNINT(VECTOR(1,1))
IF (POINTS.EQ.1) GO TO 15
SUMX=0.0D00
SUMY=0.0D00
SUMXY=0.0D00
SUMXSQ=0.0D00
SUMYSQ=0.0D00
DO 10 K=2,POINTS+1
SUMX=SUMX+VECTOR(K,1)
SUMY=SUMY+VECTOR(K,2)
SUMXY=SUMXY+VECTOR(K,1)*VECTOR(K,2)
SUMXSQ=SUMXSQ+VECTOR(K,1)*VECTOR(K,1)
SUMYSQ=SUMYSQ+VECTOR(K,2)*VECTOR(K,2)
10 CONTINUE
FINV = 1./VECTOR(1,1)
SSX=SUMXSQ-SUMX*SUMX*FINV
SSXY=SUMXY-SUMX*SUMY*FINV
SSY=SUMYSQ-SUMY*SUMY*FINV
YBAR=SUMY*FINV
XBAR=SUMX*FINV
IF (SSX.NE.0.0) THEN
SLOPE=SSXY/SSX
NTRCPT=YBAR-SLOPE*XBAR
SSE=SSY-SLOPE*SSXY
NM2=VECTOR(1,1)-2
IF (NM2.GE.1)THEN
SSQ=SSE/NM2
S=DSQRT(SSQ)
ENDIF
IF (SSY.NE.0) THEN
R=SSXY/DSQRT(SSX*SSY)
RSQ=R*R
ENDIF
ELSE
SLOPE=9.9E25
NTRCPT=XBAR
ENDIF
NMIN2 = NPTS(I) - 2
IF (NPTS(I).LE.2) R E T U R N
TOL = .05
CALL MDSTI(TOL,NMIN2,CHISQR,ERR)
CHITAB(I) = DBLE(CHISQR)
C
C MINIMUM VALUE IS EXP(0) [LOG(EXP(0)) = 0]
C MAXIMUM VALUE IS LOG(80) [LOG(EXP(80)) = 80]
C CONST = STEPSIZE FOR 100 ITERATIONS TO GO FROM
C
NXXTX = 1.0E-30
CONST = (BUX-BLX)*.01
IND1 = 1
IND2 = 1
IND3 = 1
IND4 = 1
IND5 = 1
PART = CHITAB(I)*S
DO 200 J = 1, 199, 2
NEXTX = LOG(NXXTX)
C
C MAKE CONFIDENCE +/- LINES
C
X0SQ = (NEXTX-XBAR)*(NEXTX-XBAR)
TERM = PART*DSQRT(1./FLOAT(NPTS(I)) + X0SQ/SSX)
Y = SLOPE*NEXTX + NTRCPT
C
C INCREMENT LOG LINE
C
H = GGPCT(EXP(NEXTX),0)+X0
IF ((Y.GT.0).AND.(Y.LT.(BUY-BLY))) THEN
K = GGPCT(Y,1)+Y0
ELSE
K = -1
ENDIF
IF((K.GT.Y0).AND.(K.LT.YM))THEN
VECOUT(IND5,I,5) = H
VECOUT(IND5+1,I,5) = K
IND5 = IND5 + 2
ENDIF
IF (.NOT.CONFDN) GO TO 1000
C
C CONFIDENCE UPPER LIMIT
C
IF (((Y+TERM).GT.0).AND.((Y+TERM).LT.(BUY-BLY))) THEN
K = GGPCT(SNGL(Y + TERM),1)+Y0
ELSE
K = -1
ENDIF
IF((K.GT.Y0).AND.(K.LT.YM))THEN
VECOUT(IND1,I,1) = H
VECOUT(IND1+1,I,1) = K
IND1 = IND1 + 2
ENDIF
C
C CONFIDENCE LOWER LIMIT
C
IF (((Y-TERM).GT.0).AND.((Y-TERM).LT.(BUY-BLY))) THEN
L = GGPCT(SNGL(Y - TERM),1)+Y0
ELSE
L = -1
ENDIF
IF ((L.GT.Y0).AND.(L.LT.YM))THEN
VECOUT(IND2,I,2) = H
VECOUT(IND2+1,I,2) = L
IND2 = IND2 + 2
ENDIF
1000 CONTINUE
IF (.NOT.PREDTN) GO TO 2000
C
C MAKE PREDICTION +/- LINES
C
TERM = PART*DSQRT(1./FLOAT(NPTS(I)) + X0SQ/SSX + 1.)
C
C PREDICTION HIGH GUESS
C
IF (((Y+TERM).GT.0).AND.((Y+TERM).LT.(BUY-BLY))) THEN
K = GGPCT(SNGL(Y + TERM),1)+Y0
ELSE
K = -1
ENDIF
IF((K.GT.Y0).AND.(K.LT.YM))THEN
VECOUT(IND3,I,3) = H
VECOUT(IND3+1,I,3) = K
IND3 = IND3 + 2
ENDIF
C
C PREDICTION LOW GUESS
C
IF (((Y-TERM).GT.0).AND.((Y-TERM).LT.(BUY-BLY))) THEN
L = GGPCT(SNGL(Y - TERM),1)+Y0
ELSE
L = -1
ENDIF
IF ((L.GT.Y0).AND.(L.LT.YM))THEN
VECOUT(IND4,I,4) = H
VECOUT(IND4+1,I,4) = L
IND4 = IND4 + 2
ENDIF
2000 CONTINUE
C
C INCREMENT X
C
NXXTX = NXXTX + CONST
200 CONTINUE
LINLEN(I,1) = (IND1 + 1)*.5
LINLEN(I,2) = (IND2 + 1)*.5
LINLEN(I,3) = (IND3 + 1)*.5
LINLEN(I,4) = (IND4 + 1)*.5
LINLEN(I,5) = (IND5 + 1)*.5
15 CONTINUE
DO 300 I = 1,NUMSET
DO 400 J = 1,5
CLR = MOD(I,WORKOT(40))+1
ERR = VSLCOL(DEVHDL,CLR)
IF (J.EQ.5) THEN
ERR = VSLTYP(DEVHDL,1)
ERR = VPLINE(DEVHDL,LINLEN(I,J)-1,VECOUT(1,I,J))
ELSEIF (J.LT.3) THEN
IF (CONFDN) THEN
ERR = VSLTYP(DEVHDL,3)
ERR = VPLINE(DEVHDL,LINLEN(I,J)-1,VECOUT(1,I,J))
ENDIF
ELSE
IF (PREDTN) THEN
ERR = VSLTYP(DEVHDL,2)
ERR = VPLINE(DEVHDL,LINLEN(I,J)-1,VECOUT(1,I,J))
ENDIF
ENDIF
400 CONTINUE
300 CONTINUE
R E T U R N
END
Dann Corbit
 

Re: ELOStat algorithm ?

Postby Rémi Coulom » 06 Jan 2005, 20:27

Dann Corbit wrote:Actually, the confidence and prediction intervals are not difficult to produce if you can produce an inverse to the function you are fitting.

What I am fitting is not a scalar function of a scalar variable. It is rather a probability distribution of the result between players. I do not think it can be transformed into a linear-regression problem.

Also, I am using Bayesian inference, so I do not measure uncertainty from the variance of data. Uncertainty is in the prior.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Anonymous » 06 Jan 2005, 21:53

Dann Corbit wrote:Here is a logarithmic curve fit with prediction and confidence intervals that I wrote 20 years ago (so forgive the mess -- I've learned how to program since then):
[...]
C FIND LEAST SQUARES LINE FIT
[...]


Interesting, Dann. About the same time I took a course "Computer in der Chemie" at University. One of my first programs was a least square line fit. It was in Fortran IV on a PDP 11 (which was not running Unix, I forgot the name of the OS). BTW. Is your program Standard Fortran? I forgot almost everything about Fortran until now. Does it really have a $INCLUDE, $nofloatcalls (even not capitals), $PAGESIZE, $STORAGE, $DEBUG? Also note, that the formatting of lines (which is - as you know - of special importance in older Fortran) does not display correctly when pasting. "code" and "/code" in square brackets instead of quotes will do it.

Regards,
Dieter
Anonymous
 

Re: ELOStat algorithm ?

Postby Anonymous » 06 Jan 2005, 22:01

R?mi Coulom wrote:You are right: matrix inversion is needed, not diagonalization.


Most problems, that seem to need matrix inversion don't really need a full inversion. Often triangulation (??) is enough (for example to solve x*A=b mathemeticans like to write the solution with the inverse of A, but one won't need the full inverse really). In fitting problems my experience had shown, that the matrix was often (when many parameters) is close to linear dependent, giving the typical Gauss Jordan method a hard time, even when using full pivoting. I remember that the more sophisticated routines used somehow SVD decomposition or QR factorization.

Regards,
Dieter
Anonymous
 

Re: ELOStat algorithm ?

Postby Rémi Coulom » 06 Jan 2005, 23:30

Dieter B?r?ner wrote:
R?mi Coulom wrote:You are right: matrix inversion is needed, not diagonalization.


Most problems, that seem to need matrix inversion don't really need a full inversion. Often triangulation (??) is enough (for example to solve x*A=b mathemeticans like to write the solution with the inverse of A, but one won't need the full inverse really). In fitting problems my experience had shown, that the matrix was often (when many parameters) is close to linear dependent, giving the typical Gauss Jordan method a hard time, even when using full pivoting. I remember that the more sophisticated routines used somehow SVD decomposition or QR factorization.

Regards,
Dieter


I guess what you call triangulation is LU decomposition (write the matrix as the product of an Upper-triangular matrix and a Lower-triangular matrix). I know this technique rather well, already. I use it in my swimmer simulator (http://remi.coulom.free.fr/Thesis/, if you had not noticed already).

If I want the matrix of all the likelihoods that A is tronger than B, then I need the full inverse. If I only need the confidence intervals, they are given by the diagonal of the covariance matrix. These should be easy to obtain from the LU decomposition.

I do not fear that there will be numerical problems in the case of the Bradley-Terry model. The confidence intervals for ratings are of the same order of magnitude, in general. It would take a real huge number of games to create a very strong dependency between two players. So I do not expect severe ill-conditioning, even if two players have played 1000 games against each other. In the most favorable cases (ie, round-robin) I expect the Hessian of the log-likelihood to be close to diagonal. Experiments will tell...

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Another exciting idea

Postby Rémi Coulom » 06 Jan 2005, 23:58

Another exciting idea is multi-dimensional Bradley-Terry:

http://www.agro-montpellier.fr/sfds/CD/ ... usson1.pdf

I could not find an English version available online, although it should appear soon. The paper has an English abstract, anyways, and the formulas are international (and Dann can speak French very well).

I am extremely curious of what could be the result of applying this to chess. Maybe it can tell computers apart from humans by the simple observation of game results ! It may also separate positional players from tacticians.

R?mi
Last edited by Rémi Coulom on 07 Jan 2005, 01:50, edited 1 time in total.
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Dann Corbit » 07 Jan 2005, 01:37

Dieter B?r?ner wrote:
Dann Corbit wrote:Here is a logarithmic curve fit with prediction and confidence intervals that I wrote 20 years ago (so forgive the mess -- I've learned how to program since then):
[...]
C FIND LEAST SQUARES LINE FIT
[...]


Interesting, Dann. About the same time I took a course "Computer in der Chemie" at University. One of my first programs was a least square line fit. It was in Fortran IV on a PDP 11 (which was not running Unix, I forgot the name of the OS). BTW. Is your program Standard Fortran? I forgot almost everything about Fortran until now. Does it really have a $INCLUDE, $nofloatcalls (even not capitals), $PAGESIZE, $STORAGE, $DEBUG? Also note, that the formatting of lines (which is - as you know - of special importance in older Fortran) does not display correctly when pasting. "code" and "/code" in square brackets instead of quotes will do it.

Regards,
Dieter


The Fortran flavor was Ryan-McFarland and also Microsoft.
I wrote a graphics system called G-Wiz graphics that ran on DOS computers (before Windows came along) that used the GSS*CGI for the graphical interface. So the integer conversions that you see in the code are mappings to the graphical workspace. Probably, that code could be made a lot simpler.

My Fortran is quite rusty now. I originally learned Fortran in 1977 and did some Fortran programming up to 1987. Around that time frame, I transitioned to C (and later to C++), which I liked a lot better. I think that Fortran 95 is every bit as good as C++ from looking at the spec, but I don't have any experience in the modern Fortran dialects. It's kind of a shame, because there is lots of cool numerical code I could play with if I understood it better.

You are right that I should have bracketed the code, but I just forgot to do it. Fortran 77 is simple enough that if you should see it in its original format I have no doubt that you would immediately grasp the full nature of the algorithm (which is completely simple anyway).
Dann Corbit
 

Re: Another exciting idea

Postby Dann Corbit » 07 Jan 2005, 01:42

R?mi Coulom wrote:Another exciting idea is multi-dimensional Bradley-Terry:

http://www.agro-montpellier.fr/sfds/CD/ ... usson1.pdf

I could not find an English version available online, although it should appear soon. The paper has an English abstract, anyways, and the formulas are international (and Dann can speak French very well).
{snip}
R?mi


You give me way too much credit here. I did have a full year of French in High School, and a full year of French in College. But I do not use it very often. I find that I can read French fairly well, but if I had to write it, I think it would be deplorable, and if I had to speak it it would be villanous and even listening would be a problem (with plenty of "Je ne sais pas...", "Je ne connais pas...", "Lentement, sil' vous plais..." and the like interjected by me sprinkled in)
Dann Corbit
 

Re: ELOStat algorithm ?

Postby Reinhard Scharnagl » 07 Jan 2005, 02:46

To the posters to this thread:

are you convinced for the thread title still to correspond with the thread content here?

Reinhard.
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: ELOStat algorithm ?

Postby Dann Corbit » 07 Jan 2005, 04:08

Reinhard Scharnagl wrote:To the posters to this thread:

are you convinced for the thread title still to correspond with the thread content here?

Reinhard.


The French language thing is clearly an aside.

The Fortran is an example of a curve fit model, which I suggested as a possible alternative for calculation of the error margins.

The thought was to use it for Elo calculation, but R.C. pointed out it won't work.
Dann Corbit
 

new version

Postby Rémi Coulom » 09 Jan 2005, 19:14

Hi,

I have made a new version of bayeselo.exe:
http://remi.coulom.free.fr/Bayesian-Elo/

I have not yet implemented the inverse-of-the-Hessian stuff, but made a few improvements:
  • The numerical problems that Tim Foden reported are solved
  • This version can compute the maximum-likelihood value for eloAdvantage (advantage for playing white) and eloDraw, as well as the confidence intervals for these values. The values I used in the previous version were wrong, because of a bug that is now fixed (so now, eloAdvantage = 32.8, eloDraw = 97.3).
  • The web page has a new plot of WBEC data, now with 95% confidence intervals. It shows that the model fits the data quite well.
  • Connectedness of results is automatically checked.
  • Various other cosmetic changes, including progress indicators when processing large datasets.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Dann Corbit » 10 Jan 2005, 21:57

On a file with 3.3 million PGN games (stripped of all comments and stripped of all PGN tags beyond the required 7 tags):
C:\eco>eco -7 -s -c -C jb.pgn -oclean.pgn
Games: 3312660

Only 27588 games are actually processed:
27588 games parsed

Then most are removed during the elo calcuation:
ResultSet>elo
23907 player(s) removed
ResultSet-EloRating>

Is the source code available?
Dann Corbit
 

Re: ELOStat algorithm ?

Postby Rémi Coulom » 10 Jan 2005, 22:27

Dann Corbit wrote:On a file with 3.3 million PGN games (stripped of all comments and stripped of all PGN tags beyond the required 7 tags):
C:\eco>eco -7 -s -c -C jb.pgn -oclean.pgn
Games: 3312660

Only 27588 games are actually processed:
27588 games parsed

Then most are removed during the elo calcuation:
ResultSet>elo
23907 player(s) removed
ResultSet-EloRating>

Is the source code available?

I have not yet prepared a distribution of the source code, and have no time to do it right now. I will do it once I get something more stable. If you really want to look at the PGN-parsing code, it is in the source code of my C++ chess library, available from my home page (I use CPGNFile::ReadSTR, declared in chesslib/src/chesslib/pgnfile.h).

You can also indicate where I can download a PGN file that exhibits this problem, and I will fix the code myself.

All those players were removed, because the maximum-likelihood estimation only works on a set of players that are connected by games. By default, it tries to connect to the first player. If you'd like to find the ratings of another connected component of your result, you can indicate the number of the player you want included as a parameter to the "elo" command. The list of players, with their numbers, can be obtained with the "players" command. You can use "players >players.txt" if the list is too long and you want it in a file.

In some of the PGN files that I tried, a lot of problems where caused because many programs are present with different spellings inside the same PGN, and are treated as different players.

Do you know if there is an utility to ease the process of merging different spellings and fix such PGNs ? I have been thinking about how it could be done. Maybe I will write a tool. If it does not exist, it would be very convenient for many people, I guess.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Dann Corbit » 11 Jan 2005, 01:02

R?mi Coulom wrote:[snip]
Do you know if there is an utility to ease the process of merging different spellings and fix such PGNs ? I have been thinking about how it could be done. Maybe I will write a tool. If it does not exist, it would be very convenient for many people, I guess.

R?mi


SCID has a feature called "cleaner" that does lots of cleaning of the PGN file, including attempted spelling corrections.
This file has been through the cleaner.
There are probably still thousands of badly spelled names, since there are millions of games in the file and likely 1/1000 is not correct.

http://scid.sourceforge.net/
Dann Corbit
 

PreviousNext

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 8 guests