Winboard Forum

by **Heinz van Kempen** » 23 Sep 2005, 11:39

Hi all

,

with the increasing number of engine versions and CM settings in CEGT Ratinglist 40/40 the question now is how to show the list to have a better overview on the one hand and to be still able to compare all on the other hand..

I have started to give two lists, one only with best versions, another with all versions (ignore the colours, they will be changed so that the list will be better readable).

http://www.husvankempen.de/nunn/rangliste.html

http://www.husvankempen.de/nunn/ranglisteall.html

So we would need proposals what you prefer, for example.

Best versions list - only with engines with more than 100 games?

All versions list - reduce the number of versions and CM settings shown to three or five?

Proposals are welcome.

Best Regards
Heinz

by **Pallav Nawani** » 24 Sep 2005, 05:51

Heinz van Kempen wrote:Hi all ,
Best versions list - only with engines with more than 100 games?

This is maybe not necessary, I like to see all the engines which have been tested!

Heinz van Kempen wrote:All versions list - reduce the number of versions and CM settings shown to three or five?

Yes, certainly! Having different versions is okay - but having dozens of settings is a bit boring, at least to me :mrgreen:

by **Heinz van Kempen** » 24 Sep 2005, 09:37

Hi Pallav

,

personally (like many others) I am a fan of this settings and I think they contribute a lot to the colourful world of engines.

However having more than three CM settings in the list would make it confusing to read maybe. Michael in the email group made a proposal to have a separate list for the settings fans.

An additional proposal is to give only versions with more than 300 games (for reasons of being relatively reliable results) and just add a textfille where new versions are listed as "non fixed" ratings (just to please curious people who are already asking today for the first Fritz 9 results).

Best Regards
Heinz

by **Uri Blass** » 24 Sep 2005, 09:43

Heinz van Kempen wrote:Hi Pallav ,

personally (like many others) I am a fan of this settings and I think they contribute a lot to the colourful world of engines.

However having more than three CM settings in the list would make it confusing to read maybe. Michael in the email group made a proposal to have a separate list for the settings fans.

An additional proposal is to give only versions with more than 300 games (for reasons of being relatively reliable results) and just add a textfille where new versions are listed as "non fixed" ratings (just to please curious people who are already asking today for the first Fritz 9 results).

Best Regards
Heinz

I have no problem with testing different setting but I think that it should be done for every author who ask based on equal basis and not only for chessmaster.

by **Heinz van Kempen** » 24 Sep 2005, 09:47

Hi Uri

,

this would certainly exceed our CPU ressources for 40/40, but when testing two different Kiwi settings over the last week in Blitz I thought that it might be possible to add some sort of beta testing to the Blitz rating list as soon as all important engines are in.

Best Regards
Heinz

by **Uri Blass** » 24 Sep 2005, 10:16

Hi Heinz,

Note that if being higher in the list is the reason for testing chessmaster personalities then it is more logical to test fruit personalities and if you have no time to test many personalities at long time control then it makes more sense to test them at blitz and continue at long time control only with personalities that make bigger improvement.

Note that I found that in Movei there is a bug that I have for a long time in implementing some idea of fruit and it is possible to get rid of it by a new personality if you change fruit_relative from 1 to 0.

I do not know for sure if the bug cause movei to play weaker and I will be interested in testing personality of Movei without the bug but only if it does not mean that I will need to wait until Movei play 1000 or 500 games
in order to test newer version.

Uri

by **Dr.Wael Deeb** » 24 Sep 2005, 15:33

Hi,
I don't think testing at blitz time control and making conclusions about the eventual performance at long time control is a good idea

Cheers,
Dr.Wael Deeb

by **Uri Blass** » 24 Sep 2005, 16:05

I do not think that you are right.

I do not know about a single case when a program got worse in 4 minutes/40 moves and got better in 40 minutes/40 moves.

If you show me a single case when it happened in CEGT I may change my opinion.

It is possible that some personality get bigger improvement at long time control but I think that we need to choose which programs to test at long time control and it make sense to test only programs that make significant improvement in blitz because we have not infinite time.

I prefer if CEGT test personalities of Movei only in blitz now when it is going to be tested at long time control only if I ask for it.

I suggest as first personality to generate file with the name movei_changes310.ini and put the following in it:
fruit_relative 0

I am interested to see if it can perform better than movei00_8_310
If it cannot perform better at 40/4 then there is no point in testing it in 40/40 and even if it can perform better I may prefer testing newer version later so I prefer no testing at 40/40 now for rating.

Uri

by **Heinz van Kempen** » 25 Sep 2005, 00:42

Hi Wael and Uri

,

first of all there are examples not only in CEGT for good blitzers who are doing worse with longer time control (compare Hiarcs 9 relative to Junior 9) and also for bad blitzers (you will see this as soon as we have Gandalf 6.0 in the Blitz rating list, Gandalf is known to be considerably weaker in Blitz).

If there were time controls like the 1 minute bullets popular on servers you would see that no engine comes close to Fritz 8 here (although I do not have data for Fritz 9 and Fruit WCCC'05 here and they might even top this). And if there were tournaments with 1 hour per move, surely Junior would be a star.

For testing Movei personalities and others I have to ask for patience. Blitz has no priority in CEGT. Indeed only one fast machine is running it day and night and Alessandro additionally plays tournaments for CEGT Blitz whenever he has the time to do so. Maybe he can help you if you need urgently comparison. For the moment he is playing with the new Kiwi in an Italian tournament. For authors helping in CEGT (like some did in AEGT) there will be done of course beta testing, this is just a question of receiving support and offering help and this goes without saying.

Next step in Blitz will be to include more important engines. Here currently Fritz 9 and Ktulu are run. Chess Tiger and Ruffian are still not in and Slow Chess Blitz WV2, Naum 1.82 and...and...and...

So more beta testing can start in times with few new releases or when we find one or two more guys especially interested in running CEGT Blitz.

Best Regards
Heinz

by **Uri Blass** » 25 Sep 2005, 01:48

Hi Heinz,
1)note that I did not talk about comparing between different programs.

I said:
"I do not know about a single case when a program got worse in 4 minutes/40 moves and got better in 40 minutes/40 moves."

It means that the relevant example is not Junior9 and Hiarcs9 but Fritz8 and Fritz9.

2)I think that 1 minute bullet is not important because program may use short time of 0.1 seconds for initializing some arrays that is not important in 4 minutes for 40 moves but is important at 1 minute per game.

3)I am not sure that Junior would be a star at 1 hour per move and I suspect that people may get wrong conclusions here.

If a program does better in 40 minutes/40 moves relative to 4 minutes/40 moves then it does not mean that it will continue to do better at longer time control.

Uri

by **Uri Blass** » 25 Sep 2005, 02:05

I can add that I looked at the blitz list and the long time control list and the difference between hiarcs and Junior is too small to get definite conclusion.

40/40
6 Junior 9 2690 12 12 2271 58.2 % 2632 31.9 %
7 Hiarcs 9 2673 11 11 2521 55.2 % 2637 35.5 %

40/4

6 Hiarcs 9 2711 24 24 605 65.3 % 2601 28.8 %
7 Junior 9 2687 24 24 657 63.2 % 2594 24.4 %

Uri

by **Heinz van Kempen** » 25 Sep 2005, 06:41

Hi Uri

,

okay understood now. So take a look at Fruit 2.1 40/40 and 40/4. Performance with 40/40 is relatively much better, but I admit that we need at least 2000 games also for Blitz. For Fruit WCCC'05 on the other hand performance seems to be similar for this two time controls. Fruit-Toga again seems to be specialized for Blitz, but we still need someone in 40/40 playing with this one against weaker opposition to bring the average opposition ELO down.

For Junior I had often tournaments in former times (still with Nunn positions) when it was the engine with biggest difference between 4+2 and 30 minutes what I calculated in an Excel sheet (yes I know this time schemes are too different). But whenever I look at tournaments with long time controls like for example those from Kurt and Sedat, then Junior always performs very well and it also won the first knockouts run by Graham with medium time control. On the other hand I never saw it winning a Blitz tournament against the best.

When analyzing tactical complications in correspondence chess Junior often was the engine showing a different move compared to Shredder or Hiarcs or Fritz after hours and going deeper in the position it often happened that Junior was correct. But maybe this is a bit influenced by the fact that it is one of my favourite engine because of playing style (the pawnstorms against the enemy king are just daring and sometimes it fires back to expose the own king when the opponent is able to counterattack by breaking through in the center).

Best Regards
Heinz

by **Robert Allgeuer** » 25 Sep 2005, 09:32

There are a few exceptions, but almost always these differences between longer time controls and Blitz are <30 ELO (see also my past post based on W. Eigenmann?s game database), which means you need error bars of less than +/- 15 ELO in [url]both[/url] rating lists in order to make a founded conclusion. This is practically never the case, so we have to be cautious.
E.g. the Toga being a Blitz expert conclusion with the data we have so far is pre-mature.

Robert

by **Uri Blass** » 25 Sep 2005, 11:16

one note about fruit.

it is possible that the fact that fruit WCCC shows smaller improvement in 40/40 is result of diminishing returns.

I think that if you do a program twice faster without improvement in the algorithm then the main benefit is at blitz.

It may be interesting to have 1/3FruitWCCC in the list when the idea is to give 1/2FruitWCCC 1/3 of the time (Note that it is possible to play games with these unequal time control under Fritz gui but I am not sure if it is possible to do it with shredder gui)

I guess that the difference in rating between 1/3 Fruit WCCC and Fruit WCCC is going to be higher at 40/4

Uri

by **Heinz van Kempen** » 25 Sep 2005, 13:14

Hi Uri and Robert

,

Uri, you might be correct with your assumption, but we can?t test it due to lack of testers (machines) and other priorities.

Robert, CEGT 40/4 is still "under construction" with low priority. To achieve error bars +-15 will last a few months.

Currently Fritz 9 and Ktulu 7.0a are in progress. Chess Tiger, Ruffian, Slow Chess Blitz WV2 etc. will follow. Toga currently again ahead of Shredder like in your YABRL.

http://www.husvankempen.de/nunn/eloblitz.html

Best Regards
Heinz

by **Robert Allgeuer** » 25 Sep 2005, 15:04

Heinz,
is there a games database for cegt available for download (the one you use for calculating the ELOstat ratings). As a matter of interest I would like to run bayeselo over it and see how ratings may or may not move.

Thanks
Robert

by **Heinz van Kempen** » 25 Sep 2005, 15:38

Hi Robert

,

for 40/40 you would have to download the following files from the download page plus a new overall download September C available by the end of the month with the new games currently played.

CEGT 40/40 September A 2600 games (869 KB)
CEGT 40/40 September B 2619 games (881 KB)
CEGT more than 1400 Fruit WCCC'05 games with comments(2298 KB)
Download 2500 CEGT games August (855 KB)
Download 1-2000 CEGT games July A (658 KB)
CEGT 1992 games June/July (688 KB)
CEGT freestyle Download games 1-2560 (857 KB)
CEGT 4 Download games 1-4109 (WinRAR) (1057 KB)
CEGT 3 Download games 1-3094 (WinRAR) (817 KB)
CEGT 2 Download games 1-5202 (1669 KB)
CEGT 1 Download games 1-4080 (1957 KB)
ATL support tournament Heinz 2 games 1-130 (149 KB)
ATL support tournament Heinz games 1-380 (WinRAR) (366 KB)
ATL support tournament Charles games 1-380 (WinRAR) (301 KB)

Then regrettably edit a lot of engine names, as all testers tend to give them differently and maybe kill some doubles.

For Blitz there will also be a complete file by the end of the month including new games with commercials.

Best Regards
Heinz

by **Robert Allgeuer** » 25 Sep 2005, 16:23

Thanks,
based on which files do you calculate the ratings, they must have uniform names in that file.
Any chances that you make available a cegt.zip containing the huge single pgn file of all games?

Robert

by **Heinz van Kempen** » 25 Sep 2005, 16:36

Hi Robert

,

of course in my database for rating calculation all engines have unified names. I will unstrip this 37 000 games from comments and send them to you. Probably I have to split them in smaller packages, because size file for sending here is restricted. So give to me two days, as I have some other work to do.

To add again a hugh file and have all twice or three times on my website is not such a good idea, I think.

Best Regards
Heinz

by **Robert Allgeuer** » 25 Sep 2005, 21:14

Thanks

Robert

Winboard Forum

CEGT Ratinglists

CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Re: CEGT Ratinglists

Who is online