Jonny 2.64 gauntlet after 500 games

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

Jonny 2.64 gauntlet after 500 games

Postby Heinz van Kempen » 17 Jul 2004, 18:35

Geschrieben von:/Posted by: Heinz van Kempen at 17 July 2004 19:35:22:

Hi :-),


for those that are interested in this interesting newcomer here are the results for the first ten Nunn positions and 500 games out of 1000 (what will be minimum of games in the future for my Blitz tests for all strong amateurs in order to have more reliable statistics):
Conditions:
Athlon 3000+ exclusively
64 MB Hash, 96 MB for Gothmog and Crafty
5 men EGTB
first 10 Nunn2 positions so far
Time control: 4 min. + 2 sec.

Jonny 2.64   - Thinker 4.6c                 7.0 - 13.0
Jonny 2.64   - GLC 3.0.3.4                  10.5 - 9.5
Jonny 2.64   - Movei 00_8_247 s             7.0 - 13.0
Jonny 2.64   - The Baron 1.3.1b6            9.5 - 10.5
Jonny 2.64   - Pepito v1.59                 8.5 - 11.5
Jonny 2.64   - Amyan 1.593b                 9.5 - 10.5
Jonny 2.64   - Anaconda 1.6.2               8.5 - 11.5
Jonny 2.64   - Ikarus V0.18                 9.0 - 11.0
Jonny 2.64   - Gothmog 1.0 beta 7           9.0 - 11.0
Jonny 2.64   - Delfi 4.5                    8.5 - 11.5
Jonny 2.64   - Yace 0.99.87                 7.5 - 12.5
Jonny 2.64   - Fruit 1.5                    9.5 - 10.5
Jonny 2.64   - Aristarch 4.50               6.0 - 14.0
Jonny 2.64   - Ruffian 1.0.5                2.0 - 18.0
Jonny 2.64   - SmarThink 0.18a r165         6.0 - 14.0
Jonny 2.64   - WildCat 4.0                  5.5 - 14.5
Jonny 2.64   - Deep Sjeng 1.6               7.0 - 13.0
Jonny 2.64   - Patriot 1.2.3                5.5 - 14.5
Jonny 2.64   - Ktulu 5.1                    5.5 - 14.5
Jonny 2.64   - Gandalf 4.32h                5.5 - 14.5
Jonny 2.64   - Crafty 19.14                 5.5 - 14.5
Jonny 2.64   - Little Goliath 2000 v3.9     9.5 - 10.5
Jonny 2.64   - SOS 4 for Arena              4.5 - 15.5
Jonny 2.64   - Pharaon 2.62                 9.5 - 10.5
Jonny 2.64   - LambChop 10.99               14.0 - 6.0


Those results might seem worst than they are, because I picked almost all the strongest opponents. Note that Delfi, WildCat and Gothmog and some others are improved by 30-70 points over previous versions. I will try to calculate the new list before midnight. Position number 5 (Pirc) was poison for Jonny, both with white and black pieces. There will be probably improvement for Jonny after 1000 games.
The whole gauntlet will be available for download after 1000 games completed.
Fruit 1.5 here played with original settings, as I need more games for that one against stronger opponents to have a better comparison with Fruit 1.5 t(ralala) or JR settings.
Crafty 19.14 is a Bryan Hofmann compile proving to be the fastest one on my Athlons.
For the few commercial versions please note that they are not allowed to play in the amateur grand test. It is anyway possible to vote for the last free releases Ktulu 4.2 and Patriot 0.172 light. No decision concerning that tournament is already made. All will be done tomorrow by common votes and one option that I will give for voting is that the author or in case of Crafty a tester who knows this version well may decide the settings, ranging from hash and pawnhash to learning.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
Heinz van Kempen
 

Re: Jonny 2.64 gauntlet after 500 games

Postby Dieter Eberle » 18 Jul 2004, 12:18

Geschrieben von:/Posted by: Dieter Eberle at 18. July 2004 13:18:
Als Antwort auf:/In reply to: Jonny 2.64 gauntlet after 500 games geschrieben von:/posted by: Heinz van Kempen at 17 July 2004 19:35:22:
Hi :-),



Those results might seem worst than they are, because I picked almost all the strongest opponents. Note that Delfi, WildCat and Gothmog and some others are improved by 30-70 points over previous versions. I will try to calculate the new list before midnight. Position number 5 (Pirc) was poison for Jonny, both with white and black pieces. There will be probably improvement for Jonny after 1000 games.
The whole gauntlet will be available for download after 1000 games completed.
Fruit 1.5 here played with original settings, as I need more games for that one against stronger opponents to have a better comparison with Fruit 1.5 t(ralala) or JR settings.
Crafty 19.14 is a Bryan Hofmann compile proving to be the fastest one on my Athlons.
For the few commercial versions please note that they are not allowed to play in the amateur grand test. It is anyway possible to vote for the last free releases Ktulu 4.2 and Patriot 0.172 light. No decision concerning that tournament is already made. All will be done tomorrow by common votes and one option that I will give for voting is that the author or in case of Crafty a tester who knows this version well may decide the settings, ranging from hash and pawnhash to learning.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
>for those that are interested in this interesting newcomer here are the results for the first ten Nunn positions and 500 games out of 1000 (what will be minimum of games in the future for my Blitz tests for all strong amateurs in order to have more reliable statistics):
>Conditions:
>Athlon 3000+ exclusively
>64 MB Hash, 96 MB for Gothmog and Crafty
>5 men EGTB
>first 10 Nunn2 positions so far
>Time control: 4 min. + 2 sec.
>Jonny 2.64   - Thinker 4.6c                 7.0 - 13.0
>Jonny 2.64   - GLC 3.0.3.4                  10.5 - 9.5
>Jonny 2.64   - Movei 00_8_247 s             7.0 - 13.0
>Jonny 2.64   - The Baron 1.3.1b6            9.5 - 10.5
>Jonny 2.64   - Pepito v1.59                 8.5 - 11.5
>Jonny 2.64   - Amyan 1.593b                 9.5 - 10.5
>Jonny 2.64   - Anaconda 1.6.2               8.5 - 11.5
>Jonny 2.64   - Ikarus V0.18                 9.0 - 11.0
>Jonny 2.64   - Gothmog 1.0 beta 7           9.0 - 11.0
>Jonny 2.64   - Delfi 4.5                    8.5 - 11.5
>Jonny 2.64   - Yace 0.99.87                 7.5 - 12.5
>Jonny 2.64   - Fruit 1.5                    9.5 - 10.5
>Jonny 2.64   - Aristarch 4.50               6.0 - 14.0
>Jonny 2.64   - Ruffian 1.0.5                2.0 - 18.0
>Jonny 2.64   - SmarThink 0.18a r165         6.0 - 14.0
>Jonny 2.64   - WildCat 4.0                  5.5 - 14.5
>Jonny 2.64   - Deep Sjeng 1.6               7.0 - 13.0
>Jonny 2.64   - Patriot 1.2.3                5.5 - 14.5
>Jonny 2.64   - Ktulu 5.1                    5.5 - 14.5
>Jonny 2.64   - Gandalf 4.32h                5.5 - 14.5
>Jonny 2.64   - Crafty 19.14                 5.5 - 14.5
>Jonny 2.64   - Little Goliath 2000 v3.9     9.5 - 10.5
>Jonny 2.64   - SOS 4 for Arena              4.5 - 15.5
>Jonny 2.64   - Pharaon 2.62                 9.5 - 10.5
>Jonny 2.64   - LambChop 10.99               14.0 - 6.0
>
Hello Heinz,
I think your time control is too fast for Jonny. With slower time control specifications Jonny would achieve much better results.
Regards Dieter
Dieter Eberle
 

Re: Jonny 2.64 gauntlet after 500 games

Postby Heinz van Kempen » 19 Jul 2004, 08:57

Geschrieben von:/Posted by: Heinz van Kempen at 19 July 2004 09:57:59:
Als Antwort auf:/In reply to: Re: Jonny 2.64 gauntlet after 500 games geschrieben von:/posted by: Dieter Eberle at 18. July 2004 13:18:
Hi :-),



Those results might seem worst than they are, because I picked almost all the strongest opponents. Note that Delfi, WildCat and Gothmog and some others are improved by 30-70 points over previous versions. I will try to calculate the new list before midnight. Position number 5 (Pirc) was poison for Jonny, both with white and black pieces. There will be probably improvement for Jonny after 1000 games.
The whole gauntlet will be available for download after 1000 games completed.
Fruit 1.5 here played with original settings, as I need more games for that one against stronger opponents to have a better comparison with Fruit 1.5 t(ralala) or JR settings.
Crafty 19.14 is a Bryan Hofmann compile proving to be the fastest one on my Athlons.
For the few commercial versions please note that they are not allowed to play in the amateur grand test. It is anyway possible to vote for the last free releases Ktulu 4.2 and Patriot 0.172 light. No decision concerning that tournament is already made. All will be done tomorrow by common votes and one option that I will give for voting is that the author or in case of Crafty a tester who knows this version well may decide the settings, ranging from hash and pawnhash to learning.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
Hello Heinz,
I think your time control is too fast for Jonny. With slower time control specifications Jonny would achieve much better results.
Regards Dieter
>>for those that are interested in this interesting newcomer here are the results for the first ten Nunn positions and 500 games out of 1000 (what will be minimum of games in the future for my Blitz tests for all strong amateurs in order to have more reliable statistics):
>>Conditions:
>>Athlon 3000+ exclusively
>>64 MB Hash, 96 MB for Gothmog and Crafty
>>5 men EGTB
>>first 10 Nunn2 positions so far
>>Time control: 4 min. + 2 sec.
>>Jonny 2.64   - Thinker 4.6c                 7.0 - 13.0
>>Jonny 2.64   - GLC 3.0.3.4                  10.5 - 9.5
>>Jonny 2.64   - Movei 00_8_247 s             7.0 - 13.0
>>Jonny 2.64   - The Baron 1.3.1b6            9.5 - 10.5
>>Jonny 2.64   - Pepito v1.59                 8.5 - 11.5
>>Jonny 2.64   - Amyan 1.593b                 9.5 - 10.5
>>Jonny 2.64   - Anaconda 1.6.2               8.5 - 11.5
>>Jonny 2.64   - Ikarus V0.18                 9.0 - 11.0
>>Jonny 2.64   - Gothmog 1.0 beta 7           9.0 - 11.0
>>Jonny 2.64   - Delfi 4.5                    8.5 - 11.5
>>Jonny 2.64   - Yace 0.99.87                 7.5 - 12.5
>>Jonny 2.64   - Fruit 1.5                    9.5 - 10.5
>>Jonny 2.64   - Aristarch 4.50               6.0 - 14.0
>>Jonny 2.64   - Ruffian 1.0.5                2.0 - 18.0
>>Jonny 2.64   - SmarThink 0.18a r165         6.0 - 14.0
>>Jonny 2.64   - WildCat 4.0                  5.5 - 14.5
>>Jonny 2.64   - Deep Sjeng 1.6               7.0 - 13.0
>>Jonny 2.64   - Patriot 1.2.3                5.5 - 14.5
>>Jonny 2.64   - Ktulu 5.1                    5.5 - 14.5
>>Jonny 2.64   - Gandalf 4.32h                5.5 - 14.5
>>Jonny 2.64   - Crafty 19.14                 5.5 - 14.5
>>Jonny 2.64   - Little Goliath 2000 v3.9     9.5 - 10.5
>>Jonny 2.64   - SOS 4 for Arena              4.5 - 15.5
>>Jonny 2.64   - Pharaon 2.62                 9.5 - 10.5
>>Jonny 2.64   - LambChop 10.99               14.0 - 6.0
>>
Hello Dieter,
you are correct. In all games with much more time I played with Jonny in Active Chess Leagues and Knockout tournaments Jonny performed much better. There are some others where I am convinced that they are better with more time: Examples are Comet, Quark, Green Light Chess, Deep Sjeng, The King and Junior. Maybe we can prove that for some of the many Winboard engines participating with our new common tournament and time control 40/40 simulating all a 2000 Mhz computer.
Best Regards
Heinz
Heinz van Kempen
 

Re: Jonny 2.64 gauntlet after 500 games

Postby Uri Blass » 19 Jul 2004, 11:32

Geschrieben von:/Posted by: Uri Blass at 19 July 2004 12:32:51:
Als Antwort auf:/In reply to: Re: Jonny 2.64 gauntlet after 500 games geschrieben von:/posted by: Heinz van Kempen at 19 July 2004 09:57:59:
Hi :-),



Those results might seem worst than they are, because I picked almost all the strongest opponents. Note that Delfi, WildCat and Gothmog and some others are improved by 30-70 points over previous versions. I will try to calculate the new list before midnight. Position number 5 (Pirc) was poison for Jonny, both with white and black pieces. There will be probably improvement for Jonny after 1000 games.
The whole gauntlet will be available for download after 1000 games completed.
Fruit 1.5 here played with original settings, as I need more games for that one against stronger opponents to have a better comparison with Fruit 1.5 t(ralala) or JR settings.
Crafty 19.14 is a Bryan Hofmann compile proving to be the fastest one on my Athlons.
For the few commercial versions please note that they are not allowed to play in the amateur grand test. It is anyway possible to vote for the last free releases Ktulu 4.2 and Patriot 0.172 light. No decision concerning that tournament is already made. All will be done tomorrow by common votes and one option that I will give for voting is that the author or in case of Crafty a tester who knows this version well may decide the settings, ranging from hash and pawnhash to learning.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
Hello Heinz,
I think your time control is too fast for Jonny. With slower time control specifications Jonny would achieve much better results.
Regards Dieter
Hello Dieter,
you are correct. In all games with much more time I played with Jonny in Active Chess Leagues and Knockout tournaments Jonny performed much better. There are some others where I am convinced that they are better with more time: Examples are Comet, Quark, Green Light Chess, Deep Sjeng, The King and Junior. Maybe we can prove that for some of the many Winboard engines participating with our new common tournament and time control 40/40 simulating all a 2000 Mhz computer.
Best Regards
Heinz
>>>for those that are interested in this interesting newcomer here are the results for the first ten Nunn positions and 500 games out of 1000 (what will be minimum of games in the future for my Blitz tests for all strong amateurs in order to have more reliable statistics):
>>>Conditions:
>>>Athlon 3000+ exclusively
>>>64 MB Hash, 96 MB for Gothmog and Crafty
>>>5 men EGTB
>>>first 10 Nunn2 positions so far
>>>Time control: 4 min. + 2 sec.
>>>Jonny 2.64   - Thinker 4.6c                 7.0 - 13.0
>>>Jonny 2.64   - GLC 3.0.3.4                  10.5 - 9.5
>>>Jonny 2.64   - Movei 00_8_247 s             7.0 - 13.0
>>>Jonny 2.64   - The Baron 1.3.1b6            9.5 - 10.5
>>>Jonny 2.64   - Pepito v1.59                 8.5 - 11.5
>>>Jonny 2.64   - Amyan 1.593b                 9.5 - 10.5
>>>Jonny 2.64   - Anaconda 1.6.2               8.5 - 11.5
>>>Jonny 2.64   - Ikarus V0.18                 9.0 - 11.0
>>>Jonny 2.64   - Gothmog 1.0 beta 7           9.0 - 11.0
>>>Jonny 2.64   - Delfi 4.5                    8.5 - 11.5
>>>Jonny 2.64   - Yace 0.99.87                 7.5 - 12.5
>>>Jonny 2.64   - Fruit 1.5                    9.5 - 10.5
>>>Jonny 2.64   - Aristarch 4.50               6.0 - 14.0
>>>Jonny 2.64   - Ruffian 1.0.5                2.0 - 18.0
>>>Jonny 2.64   - SmarThink 0.18a r165         6.0 - 14.0
>>>Jonny 2.64   - WildCat 4.0                  5.5 - 14.5
>>>Jonny 2.64   - Deep Sjeng 1.6               7.0 - 13.0
>>>Jonny 2.64   - Patriot 1.2.3                5.5 - 14.5
>>>Jonny 2.64   - Ktulu 5.1                    5.5 - 14.5
>>>Jonny 2.64   - Gandalf 4.32h                5.5 - 14.5
>>>Jonny 2.64   - Crafty 19.14                 5.5 - 14.5
>>>Jonny 2.64   - Little Goliath 2000 v3.9     9.5 - 10.5
>>>Jonny 2.64   - SOS 4 for Arena              4.5 - 15.5
>>>Jonny 2.64   - Pharaon 2.62                 9.5 - 10.5
>>>Jonny 2.64   - LambChop 10.99               14.0 - 6.0
>>>
1)The problem is that 40/40 is different time control than fisher time control.
If we want fair comparison then we should compare between 3 minutes/40 moves and 40 minutes/40 moves
If possible then it is better that all the engines that are tested at 40/40 will also be tested at the following time controls:
a)3 minutes/40 moves
b)4+2 time control(time that you already use)
c)6 minutes per game time control
Note that games in this time controls take less than 30% of the time relative to games with 40/40 time control so I hope that testers that test 40/40 will also agree to test in these time controls.
2)It seem that some testers are more interested in tournaments and not in having less errors in the results of testing and their target is not to reduce statistical error in the unified effort.
I think that we can get better answer to the question which engine has the potential to be better if you add good book and good learning and not in the question which engine is better and I prefer tests that we can get more exact answer for them.
I think that nunn type tournament without learning after games are the best for the first question(I have no objection for more positions if somebody suspects that some engine is specially tuned for the nunn position).
I do not claim that the test represent the full ability of the engines and it is possible to make it clear.
If people want to try to test the full ability of the engines and translate it to a rating then there is a problem because engine A may be better than engine B after 500 games and worse than B after 1000 games not because of statistical error but because of the fact that B learns better and it is not clear when we stop(we always play finite number of games).
I am for tournaments like Leo's tournament or the RWBC or the infinite loop but they all have different conditions and I cannot consider them as part of an effort to give rating with relatively small statistical errors and if you want to have rating with small statistical errors then the only way is not to test the full capabilities of the engines.
Uri
Uri Blass
 

Re: Jonny 2.64 gauntlet after 500 games

Postby Heinz van Kempen » 19 Jul 2004, 12:19

Geschrieben von:/Posted by: Heinz van Kempen at 19 July 2004 13:19:58:
Als Antwort auf:/In reply to: Re: Jonny 2.64 gauntlet after 500 games geschrieben von:/posted by: Uri Blass at 19 July 2004 12:32:51:

Hello Uri,
1)The problem is that 40/40 is different time control than fisher time control.
If we want fair comparison then we should compare between 3 minutes/40 moves and 40 minutes/40 moves
If possible then it is better that all the engines that are tested at 40/40 will also be tested at the following time controls:
a)3 minutes/40 moves
b)4+2 time control(time that you already use)
c)6 minutes per game time control
Note that games in this time controls take less than 30% of the time relative to games with 40/40 time control so I hope that testers that test 40/40 will also agree to test in these time controls.>>
If people want to try to test the full ability of the engines and translate it to a rating then there is a problem because engine A may be better than engine B after 500 games and worse than B after 1000 games not because of statistical error but because of the fact that B learns better and it is not clear when we stop(we always play finite number of games).
I am for tournaments like Leo's tournament or the RWBC or the infinite loop but they all have different conditions and I cannot consider them as part of an effort to give rating with relatively small statistical errors and if you want to have rating with small statistical errors then the only way is not to test the full capabilities of the engines.>>
Uri
There are other testers like for example Olivier who are accustumed to other time controls without Fischer bonus. This is a common project and it does not count what I am usually doing. We have majority votes for all things here.


From interchange and our private mails that go to all testers I have the impression that all testers are interested in both and take this project seriously, but also want to have a bit of fun, what is okay.

I do not claim that the test represent the full ability of the engines and it is possible to make it clear.
As I am always stating all tournaments have advantages and disadvantages. Some are better for rating calculation, others better for good quality, because of more time, others better for suspense and fun. It is not possible in my opinion to do something like the perfect tournament that will fulfil all aims testers and authors of engines might have.
Nunn tournaments have the advantage to test the variety of usual openings independent of good or bad own books engines might have and they are quite good for statistics, but they are also limited concerning the number of positions for example and the possibility that programmers might tune against those positions. And learning what we will have in more and more engines in the future and Nunn positions together is something doubtful.
Personally I like Leo´s tournaments most, but anyhow I like them all and do not want to copy thinks that are already fine. I think this common tournament is something unique and worth a try, but we should not have too high expectations, that it could give answers to all questions.
Let´s wait and see. The most important thing is the voting now. I only coordinate things (for being somehow forced into that :-)) and it only counts what the majority wants in such a case.
Best Regards
Heinz
Heinz van Kempen
 

Re: Jonny 2.64 gauntlet after 500 games

Postby Uri Blass » 19 Jul 2004, 13:02

Geschrieben von:/Posted by: Uri Blass at 19 July 2004 14:02:42:
Als Antwort auf:/In reply to: Re: Jonny 2.64 gauntlet after 500 games geschrieben von:/posted by: Heinz van Kempen at 19 July 2004 13:19:58:
Hello Uri,
1)The problem is that 40/40 is different time control than fisher time control.
If we want fair comparison then we should compare between 3 minutes/40 moves and 40 minutes/40 moves
If possible then it is better that all the engines that are tested at 40/40 will also be tested at the following time controls:
a)3 minutes/40 moves
b)4+2 time control(time that you already use)
c)6 minutes per game time control
Note that games in this time controls take less than 30% of the time relative to games with 40/40 time control so I hope that testers that test 40/40 will also agree to test in these time controls.>>
I do not claim that the test represent the full ability of the engines and it is possible to make it clear.
If people want to try to test the full ability of the engines and translate it to a rating then there is a problem because engine A may be better than engine B after 500 games and worse than B after 1000 games not because of statistical error but because of the fact that B learns better and it is not clear when we stop(we always play finite number of games).
I am for tournaments like Leo's tournament or the RWBC or the infinite loop but they all have different conditions and I cannot consider them as part of an effort to give rating with relatively small statistical errors and if you want to have rating with small statistical errors then the only way is not to test the full capabilities of the engines.>>
There are other testers like for example Olivier who are accustumed to other time controls without Fischer bonus. This is a common project and it does not count what I am usually doing. We have majority votes for all things here.


From interchange and our private mails that go to all testers I have the impression that all testers are interested in both and take this project seriously, but also want to have a bit of fun, what is okay.

I think that nunn type tournament without learning after games are the best for the first question(I have no objection for more positions if somebody suspects that some engine is specially tuned for the nunn position).
As I am always stating all tournaments have advantages and disadvantages. Some are better for rating calculation, others better for good quality, because of more time, others better for suspense and fun. It is not possible in my opinion to do something like the perfect tournament that will fulfil all aims testers and authors of engines might have.
Nunn tournaments have the advantage to test the variety of usual openings independent of good or bad own books engines might have and they are quite good for statistics, but they are also limited concerning the number of positions for example and the possibility that programmers might tune against those positions. And learning what we will have in more and more engines in the future and Nunn positions together is something doubtful.

I do not think that programmers usually have time to tune for the nunn positions because people do not consider them as the most important and they are only a tool for the programmers to learn if their engine improved.
I use nunn position for testing but different nunn positions relative to the positions that you use(I use nunn test with 50 positions when the positions are different than the 40 positions that you use).
I do not care if some programmers tune for the nunn positions and the more interesting question is to compare rating at different time control(something that is a problem when learning is involved)
An engine may learn wrong information from blitz and perform worse at long time control not because it is weaker at long time control.
An engine may be better at long time control because of better book.
I am more interested in cases that engines perform better at long time control because of better evaluation or better search and we cannot detect these cases
if we use books or learning.
If we use learning or books the information is at least less interesting for me.
Testing book and learning should be done seperately when the same engine can be tested with book A and with book B or with learning A and learning B against the same opponents and compare the results.
In that case I want only learning or only book to be changed and not the engine without book and learning.
We can find in that case that for some engine book A and learning A is better than book B and learning B at short time control when the opposite is at long time control(I believe that big book should be better for blitz and the idea is that at short time control it is important to save time and the engine cannot find better moves than the book moves by itself when at long time control the engine may find better moves than the book moves by itself)
Uri
Uri Blass
 

Re: Jonny 2.64 gauntlet after 500 games

Postby Uri Blass » 19 Jul 2004, 13:04

Geschrieben von:/Posted by: Uri Blass at 19 July 2004 14:04:36:
Als Antwort auf:/In reply to: Re: Jonny 2.64 gauntlet after 500 games geschrieben von:/posted by: Uri Blass at 19 July 2004 14:02:42:
Hello Uri,
1)The problem is that 40/40 is different time control than fisher time control.
If we want fair comparison then we should compare between 3 minutes/40 moves and 40 minutes/40 moves
If possible then it is better that all the engines that are tested at 40/40 will also be tested at the following time controls:
a)3 minutes/40 moves
b)4+2 time control(time that you already use)
c)6 minutes per game time control
Note that games in this time controls take less than 30% of the time relative to games with 40/40 time control so I hope that testers that test 40/40 will also agree to test in these time controls.>>
I think that nunn type tournament without learning after games are the best for the first question(I have no objection for more positions if somebody suspects that some engine is specially tuned for the nunn position).
I do not claim that the test represent the full ability of the engines and it is possible to make it clear.
If people want to try to test the full ability of the engines and translate it to a rating then there is a problem because engine A may be better than engine B after 500 games and worse than B after 1000 games not because of statistical error but because of the fact that B learns better and it is not clear when we stop(we always play finite number of games).
I am for tournaments like Leo's tournament or the RWBC or the infinite loop but they all have different conditions and I cannot consider them as part of an effort to give rating with relatively small statistical errors and if you want to have rating with small statistical errors then the only way is not to test the full capabilities of the engines.>>
There are other testers like for example Olivier who are accustumed to other time controls without Fischer bonus. This is a common project and it does not count what I am usually doing. We have majority votes for all things here.


From interchange and our private mails that go to all testers I have the impression that all testers are interested in both and take this project seriously, but also want to have a bit of fun, what is okay.

As I am always stating all tournaments have advantages and disadvantages. Some are better for rating calculation, others better for good quality, because of more time, others better for suspense and fun. It is not possible in my opinion to do something like the perfect tournament that will fulfil all aims testers and authors of engines might have.
Nunn tournaments have the advantage to test the variety of usual openings independent of good or bad own books engines might have and they are quite good for statistics, but they are also limited concerning the number of positions for example and the possibility that programmers might tune against those positions. And learning what we will have in more and more engines in the future and Nunn positions together is something doubtful.

I do not think that programmers usually have time to tune for the nunn positions because people do not consider them as the most important and they are only a tool for the programmers to learn if their engine improved.
I use nunn position for testing but different nunn positions relative to the positions that you use(I use nunn test with 50 positions when the positions are different than the 40 positions that you use).

Correction:
It is 25 and 20 instead of 50 and 40 and I confused between number of games and number of positions.
Uri
Uri Blass
 

Re: Jonny 2.64 gauntlet after 500 games

Postby Heinz van Kempen » 19 Jul 2004, 13:33

Geschrieben von:/Posted by: Heinz van Kempen at 19 July 2004 14:33:35:
Als Antwort auf:/In reply to: Re: Jonny 2.64 gauntlet after 500 games geschrieben von:/posted by: Uri Blass at 19 July 2004 14:04:36:
Correction:
It is 25 and 20 instead of 50 and 40 and I confused between number of games and number of positions.
Uri
Hello Uri,
interesting, it is possible to download those additional positions somewhere? I also thought about using Noomen positions, too, in order to have more variety.
Most interesting thing your thoughts about learning. Really it might be that an engine learns bad things in Blitz games that makes it weaker afterwards for games with more time.
I will have some work now with uploading a few Nunn tournaments that are already finished and with coordination for group B also.
I will come back to your points later, but will already have to disappoint you, because from the votings I suppose that our tournament will not fulfil your personal needs for useful information. For example we already have a vast majority in favour of book and position learning.
Best Regards
Heinz
Heinz van Kempen
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 26 guests

cron