AEGT concept--proposals for discussion

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

AEGT concept--proposals for discussion

Postby Heinz van Kempen » 04 Aug 2004, 04:50

Geschrieben von:/Posted by: Heinz van Kempen at 04 August 2004 05:50:49:

Hi all :-),
as things are running fine so far and it seems that not a lot of testers interrupt their double round robins and retreat at this stage, I will explain here some ideas how this could proceed. Of course I will give a new voting sheet to the remaining testers to discuss some of the points below.
First step (in progress):
________________________
for each group at least 6 double round robins will be played, giving at first a minimum of 6x22 games = 132 games per engine
Second step:
____________
(first option, according to what testers want)
_____________
after this or already starting at the same time if some already finished we need the gauntlets to have connections between the classes for rating calculation.
I give an example: Pro Deo plays a single gauntlet 6 games against all participants from King Class and all from Queen Class, giving 132 games to Pro Deo and 6 games more to all the others. The same will be done with SOS 4, Little Goliath Revival, Ktulu 4.2 (?), Patriot 0.172l (?) and Gandalf 4.32f (?).
To explain the question marks...here are commercial versions on the market. But another argument is that still the last free releases are available for download (except Gandalf that was removed) and we do not have to pay for them. My proposal would be to use those "hybrid" engines only for the gauntlets and not for the tournaments. For the same reason SmarThink will probably not be able to play tournaments in a possible future AEGT 2 and would then only play the gauntlets.
Here are the engines that should for example (according to strength) play the gauntlets between Queen and Rook Class: Anaconda, LambChop, Abrok, Betsy, Nejmet. I would exclude Ikarus because it was never available for download and came in a commercial bundle (Young Talents). (Authors of private engines can also apply for the gauntlets to qualify for a future AEGT).
And finally those between Rook Class and Bishop Class: Patzer, Leila, Averno, Chispa, Queen, Tinker, Fafis (only if version 1.0.9 will be publically available).
There are more in this gap like Sjeng(?), IceSpell (?, here the question is if more than one engine per author is allowed, on the other hand IceSpell is only by Volker and Spike by Ralf and Volker, the same question comes up when Tord will release his second engine), Beowulf, Chiron, Gnu, KnightX, Scidlet, Nullmover, Eeyore, Dorky, The Butcher, Tytan, Sunsetter, Greko, NagaSkaki, AICE, WJChess, Delphil and many more. Due to the high number of engines I propose for the weaker levels to leave those out that were not updated for a long while (Doctor, Phalanx, Chezzz, Resp, Exchess, Lord King, Madeleine, InMiChess and more).
Second option:
_____________
if not a lot of testers "resign" after step one and/or we could find new testers, additionally to the gauntlets we could run a Knight and/or Pawn Class with the engines above and/or others.
Possible AEGT 2 and changes:
After running the gauntlets we will be able to calculate diverse rating lists, one overall and several depending on GUI and openings.
My proposal would be that we do not do another voting for favourite engines, but simply put the twelve strongest according to rating (except the hybrids) in King Class (updates allowed). The next twelve in Queen Class and so on. For a possible AEGT 3 there could then be promotion and relegation (for example 2, 3 or 4 engines from each class).
Okay, I can also tell how many testers are needed to run a second AEGT with a certain number of classes:
1-3 testers (computers)--- only King Class possible
4-6 --- additionally Queen Class
7-9 --- plus Rook Class
10-12---adding Bishop Class
13-15---adding Knight Class
16-18---adding Pawn Class
What to do if a new strong engine appears like Pro Deo now. Although no gauntlets would be needed for AEGT 2 (because a lot of versions will remain unchanged) I wóuld run a gauntlet here for the new one and depending on rating it will cause demotion of one more engine from the class it qualifies for, what transfers to the classes below.
Okay these are only proposals and we can discuss them here with all and/or in our private email group. For future tournaments I am theoretically also inclined to use Arena instead of Fritz GUI, but my friend Ralf would drop out then and maybe others would not join our group, if we are that restrictive. One thing a lot of authors always wrote to me: "I want my engine perform well under any GUI and be overall good. So it seems less a problem for the programmers than for other testers, who are very convinced of using the one and only "best" GUI and want not tolerate others. Hope we can all try not to think that all what we are doing ourselves from time controls, to settings, rules and GUI´s has to be necessarily the best. For myself I like all the tournaments here and do not want to miss a single one. And the main thing for this forum surely is that Winboard engines are concerned, also when they in Chess Partner GUI.
Best Regards
Heinz
P.S.: I am only at home from time to time today and will not be able to react at once to questions, what should not discourage others to start a discussion, if it is not too hot today for such boring stuff :-).
Heinz van Kempen
 

Re: AEGT concept--proposals for discussion

Postby Dan Honeycutt » 04 Aug 2004, 05:26

Geschrieben von:/Posted by: Dan Honeycutt at 04 August 2004 06:26:09:
Als Antwort auf:/In reply to: AEGT concept--proposals for discussion geschrieben von:/posted by: Heinz van Kempen at 04 August 2004 05:50:49:

Hi Heinz:
The tournament is great. So much happening, results coming from every direction, it's sometimes a little hard to tell who stands where but that just makes it more interesting. I see close battles in every class except perhaps rook - pepito (my favorite engine) looks awfully strong there.
I'm not a tester so I don't get a vote but I'd like to see promotions/placement handled by play rather than rating calculation - top x from one class against bottom x from class above or something of that sort.
Hats off to you for organizing this tournament and to all testers.
Dan H.
Dan Honeycutt
 

Re: AEGT concept--proposals for discussion

Postby Uri Blass » 04 Aug 2004, 06:13

Geschrieben von:/Posted by: Uri Blass at 04 August 2004 07:13:44:
Als Antwort auf:/In reply to: AEGT concept--proposals for discussion geschrieben von:/posted by: Heinz van Kempen at 04 August 2004 05:50:49:
Hi all :-),
as things are running fine so far and it seems that not a lot of testers interrupt their double round robins and retreat at this stage, I will explain here some ideas how this could proceed. Of course I will give a new voting sheet to the remaining testers to discuss some of the points below.
First step (in progress):
________________________
for each group at least 6 double round robins will be played, giving at first a minimum of 6x22 games = 132 games per engine
Second step:
____________
(first option, according to what testers want)
_____________
after this or already starting at the same time if some already finished we need the gauntlets to have connections between the classes for rating calculation.
I give an example: Pro Deo plays a single gauntlet 6 games against all participants from King Class and all from Queen Class, giving 132 games to Pro Deo and 6 games more to all the others. The same will be done with SOS 4, Little Goliath Revival, Ktulu 4.2 (?), Patriot 0.172l (?) and Gandalf 4.32f (?).
To explain the question marks...here are commercial versions on the market. But another argument is that still the last free releases are available for download (except Gandalf that was removed) and we do not have to pay for them. My proposal would be to use those "hybrid" engines only for the gauntlets and not for the tournaments. For the same reason SmarThink will probably not be able to play tournaments in a possible future AEGT 2 and would then only play the gauntlets.
Here are the engines that should for example (according to strength) play the gauntlets between Queen and Rook Class: Anaconda, LambChop, Abrok, Betsy, Nejmet. I would exclude Ikarus because it was never available for download and came in a commercial bundle (Young Talents). (Authors of private engines can also apply for the gauntlets to qualify for a future AEGT).
And finally those between Rook Class and Bishop Class: Patzer, Leila, Averno, Chispa, Queen, Tinker, Fafis (only if version 1.0.9 will be publically available).
Fafis should be at least queen class
Fafis scored better than Quark and Jonny in the following link(40 moves/10 minutes with ponder on) and Quark is in the king class so it may be even better to decide between Queen and king class and certainly there should be no doubt that it does not belong to the bishop class so testing against bishop class engines is a waste of time.
http://f27.parsimony.net/forum67213/messages/2205.htm
I can also add that I do not understand how do you decide about choices for classes
Ufim is simply not strong enough for the level of the queen class and
DanChess is in higher level than bishop class.
You could see it based on results of other tournament that are not fast blitz when Danchess scored better than my weak Movei and certainly better than Ufim in another tournament of Frank quinsky.
http://f27.parsimony.net/forum67213/messages/2107.htm
I also remember that DanChess scored better than Movei in a tournament that Dan Corbit posted in CCC and I do not know about a single tournament when Movei did better so I guess that DanChess is stronger than movei that you have unless I see some evidence that it is not the case.
Uri
Uri Blass
 

Re: AEGT concept--proposals for discussion

Postby Dann Corbit » 04 Aug 2004, 06:42

Geschrieben von:/Posted by: Dann Corbit at 04 August 2004 07:42:45:
Als Antwort auf:/In reply to: Re: AEGT concept--proposals for discussion geschrieben von:/posted by: Uri Blass at 04 August 2004 07:13:44:

[snip]
DanChess is in higher level than bishop class.
You could see it based on results of other tournament that are not fast blitz when Danchess scored better than my weak Movei and certainly better than Ufim in another tournament of Frank quinsky.
http://f27.parsimony.net/forum67213/messages/2107.htm
I also remember that DanChess scored better than Movei in a tournament that Dan Corbit posted in CCC and I do not know about a single tournament when Movei did better so I guess that DanChess is stronger than movei that you have unless I see some evidence that it is not the case.
I think that a single tournament against a limited number of opponents is not enough data to make that judgement. DanChess may or may not be stronger than Movei.



my ftp site {remove http:// unless you like error messages}
Dann Corbit
 

Re: AEGT concept--proposals for discussion

Postby Volker Boehm » 04 Aug 2004, 06:59

Geschrieben von:/Posted by: Volker Boehm at 04 August 2004 07:59:26:
Als Antwort auf:/In reply to: AEGT concept--proposals for discussion geschrieben von:/posted by: Heinz van Kempen at 04 August 2004 05:50:49:

Hi,
some words to IceSpell:
IceSpell is based on a large template framework built to quickly develop two person null-sum games with an own gui. The template has so far been used for reversi, connect-four, mule, connect-five and chess. Based on this framework it took 10 hours of development to get a first chess game with gui that plays (most) games correctly and could not been beaten by myself.
Before implementing chess I planned to implement a two person game based on dice.
(I wanted to find out how negamax can be adapted for games that has hazard in it - the idea was to compute all moves that can be made for every possible dice number and multiply the result by the chance of the number (thus 1/6 for one dice. PositionValue = Sum[Position(DiceValue)*1/6])
Other Idea is to implement a game with more than one player (example 4) and to have 4 PositionValues P1, P2, P3, P4. The value currently to maximize is then a function f(Player, P1, P2, P3, P4), with Player = 1..4.
)
The IceSpell framework is much to general to be fast enough for chess. I found myself spending lots of time changing the framework to suit chess-speed needs and to get a not-anymore-welldesigned framework that will be hard to readapt to the other games.
Thus I planned to write a engine from scratch, not using any framework and I though that it will be much more fun to do it together. Thus Ralf and I planned the new engine.
For IceSpell: I don´t like the engine much because it has a framework that is "missused" for speed issues and hard to improve.
Thus for me Spike is a kind of successor of IceSpell, even if I am only one of two developers of Spike. I will not put any more work on IceSpell. Feel free to use IceSpell on your tournament. If you want I will send you IceSpell. But I am not interrested in his games result (except to compare against Spike that currently gets about 65% of points vs. IceSpell in blitz games). But I suggest that you use engines where the results are interresting for the author and not IceSpell.
Greetings Volker
Volker Boehm
 

Re: AEGT concept--proposals for discussion

Postby Heinz van Kempen » 04 Aug 2004, 08:43

Geschrieben von:/Posted by: Heinz van Kempen at 04 August 2004 09:43:32:
Als Antwort auf:/In reply to: Re: AEGT concept--proposals for discussion geschrieben von:/posted by: Volker Boehm at 04 August 2004 07:59:26:
Hi,
some words to IceSpell:
IceSpell is based on a large template framework built to quickly develop two person null-sum games with an own gui. The template has so far been used for reversi, connect-four, mule, connect-five and chess. Based on this framework it took 10 hours of development to get a first chess game with gui that plays (most) games correctly and could not been beaten by myself.
Before implementing chess I planned to implement a two person game based on dice.
(I wanted to find out how negamax can be adapted for games that has hazard in it - the idea was to compute all moves that can be made for every possible dice number and multiply the result by the chance of the number (thus 1/6 for one dice. PositionValue = Sum[Position(DiceValue)*1/6])
Other Idea is to implement a game with more than one player (example 4) and to have 4 PositionValues P1, P2, P3, P4. The value currently to maximize is then a function f(Player, P1, P2, P3, P4), with Player = 1..4.
)
The IceSpell framework is much to general to be fast enough for chess. I found myself spending lots of time changing the framework to suit chess-speed needs and to get a not-anymore-welldesigned framework that will be hard to readapt to the other games.
Thus I planned to write a engine from scratch, not using any framework and I though that it will be much more fun to do it together. Thus Ralf and I planned the new engine.
For IceSpell: I don´t like the engine much because it has a framework that is "missused" for speed issues and hard to improve.
Thus for me Spike is a kind of successor of IceSpell, even if I am only one of two developers of Spike. I will not put any more work on IceSpell. Feel free to use IceSpell on your tournament. If you want I will send you IceSpell. But I am not interrested in his games result (except to compare against Spike that currently gets about 65% of points vs. IceSpell in blitz games). But I suggest that you use engines where the results are interresting for the author and not IceSpell.
Greetings Volker
Hi Volker,
very interesting stuff. I already had the impression that you are fully concentrated now on Spike with Ralf and I think that this is best. The progresses you made over the last weeks and months are really astonishing.
Best Regards
Heinz
Heinz van Kempen
 

Re: AEGT concept--proposals for discussion

Postby Heinz van Kempen » 04 Aug 2004, 09:12

Geschrieben von:/Posted by: Heinz van Kempen at 04 August 2004 10:12:58:
Als Antwort auf:/In reply to: Re: AEGT concept--proposals for discussion geschrieben von:/posted by: Uri Blass at 04 August 2004 07:13:44:

Hello Uri,
Fafis should be at least queen class
Fafis scored better than Quark and Jonny in the following link(40 moves/10 minutes with ponder on) and Quark is in the king class so it may be even better to decide between Queen and king class and certainly there should be no doubt that it does not belong to the bishop class so testing against bishop class engines is a waste of time.
http://f27.parsimony.net/forum67213/messages/2205.htm
I can also add that I do not understand how do you decide about choices for classes
Ufim is simply not strong enough for the level of the queen class and
DanChess is in higher level than bishop class.
You could see it based on results of other tournament that are not fast blitz when Danchess scored better than my weak Movei and certainly better than Ufim in another tournament of Frank quinsky.
http://f27.parsimony.net/forum67213/messages/2107.htm
I also remember that DanChess scored better than Movei in a tournament that Dan Corbit posted in CCC and I do not know about a single tournament when Movei did better so I guess that DanChess is stronger than movei that you have unless I see some evidence that it is not the case.
Uri
Fafis up to version 1.0.8 was a relatively weak engine far below level of Bishop Class. This above is only one tournament with not so much games. But okay, when it wins all games against strong Bishop Class engines like DanChess or Spike it will anyway get a skyhigh rating and qualify even for King Class, who knows.
Our voting and criteria I explained in an earlier post:
http://f11.parsimony.net/forum16635/messages/68896.htm>
Main point is that we did not vote mainly because of strength, but also for other reasons. One criteria was fast improvements this year and so Ufim came in (maybe it will still improve, started with 2.5 out of 4 in Ralf´s double round robin). In case of Ufim it would maybe have been better to have it in Rook Class. One day before the votings were given there was the UEL invitational tournament (I like them, although pairing Aristarch against Firefly and weaker might lead easily to exaggerated ratings for the start) and Ufim in the beginning was second or third best engine in the UCI rating list. As it usually happened it now dropped, but Patrick´s rating are also trustworthy as soon as he played a lot of this fine groups with about equally strong engines. From my rating list I would have given more points to ElChinito, a very fine choice for King Class, but I was also not sure how much Ufim 5.01 was again improved over 5.00 and if it is better or weaker with long timecontrols.
Another point was performance in our own tournaments (not in others) and results you give here were not known at this point, although a big improvement for DanChess might have been foreseen, on the other hand DanChess has opponents in Bishop Class, that are also improved and give a lot of competition like Spike, Snitch, incredibly improved Bruja, Cerebro etc.. In my tournaments I had only DanChess 1.0.4d that was considerably weaker. I also guess based on my ratings that Pepito is a bit too strong for Rook Class.
Anyway after the first AEGT and rating calculation every engine should be where it belongs if the other testers like my proposal to give strength based on the calculated rating list then as only criteria for AEGT 2.
Best Regards
Heinz
Heinz van Kempen
 

Re: AEGT concept--proposals for discussion

Postby Heinz van Kempen » 04 Aug 2004, 09:31

Geschrieben von:/Posted by: Heinz van Kempen at 04 August 2004 10:31:12:
Als Antwort auf:/In reply to: Re: AEGT concept--proposals for discussion geschrieben von:/posted by: Dan Honeycutt at 04 August 2004 06:26:09:
Hi Heinz:
The tournament is great. So much happening, results coming from every direction, it's sometimes a little hard to tell who stands where but that just makes it more interesting. I see close battles in every class except perhaps rook - pepito (my favorite engine) looks awfully strong there.
I'm not a tester so I don't get a vote but I'd like to see promotions/placement handled by play rather than rating calculation - top x from one class against bottom x from class above or something of that sort.
Hats off to you for organizing this tournament and to all testers.
Dan H.
Hello Dan,
thanks :-), at least by including Bruja I think we had a good guess (see postings from Olivier and me in Spanish language). Another engine that made a big leap from one version to another.
I try to coordinate results and for having equally number of games for better comparison I proposed the double round robins. But some are on holidays and let therefore their machines run with gauntlets or only including six engines for the start. For me this is okay, with a bit of imagination you will guess that for example 11.5 out of 20 might be much better than 13 points out of 28. And all will be easy to compare again as soon as Igor finished his double round robin with all engines in King Class and we have all results from Brian. For Igor I have to say that he wanted to give results only after more games completed, so here it was my decision to distort the overall cross table again. It is anyway no prob to remove Igor´s results until he completed all games.
Your suggestion to handle promotion/placement by play can be followed for AEGT 3 based on AEGT 2 results (if we have that in a few months). For AEGT 2 it is not possible, because we have to decide fairly where those gauntlet engines go to. SOS 4 of course also belongs clearly in the top of King Class and it is not in there because of the limited number of participants (12) also given by votes and because a lot of us preferred playing style to strength(AnMon and Gothmog, although the latter is really gorgeous now concerning strength). Quark is in King Class because of an estimation from some testers that it is one engine much better with more time. We based this on former tournaments from Leo.
Bueno, con un montón de hechizos más Bruja incluso podría ganar el torneo alfíl.
Best Regards
Heinz
Heinz van Kempen
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 30 guests