calibrating human tactics skills

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

calibrating human tactics skills

Postby Andrei Petcherski » 04 Jun 2004, 05:21

Geschrieben von:/Posted by: Andrei Petcherski at 04. June 2004 06:21:

What is the numerical (elo) evidence that the top engines play at 3000 elo tactically. If one makes a test suite of purely tactical problems and demonstrates that a typical GM solves only 50% of the problems,
while an engine solves >90%, then it would be a valid claim.
Has this kind of experiment ever been documented? in fact it would cost only
1000-2000$ to have a few GMs solve some combos for a few hours. I wish I had
this kind of money sitting around . Anybody here is interested in financing the experiment?
Andrei Petcherski
 

Re: calibrating human tactics skills

Postby Tord Romstad » 04 Jun 2004, 11:58

Geschrieben von:/Posted by: Tord Romstad at 04 June 2004 12:58:22:
Als Antwort auf:/In reply to: calibrating human tactics skills geschrieben von:/posted by: Andrei Petcherski at 04. June 2004 06:21:
What is the numerical (elo) evidence that the top engines play at 3000 elo tactically.
If one makes a test suite of purely tactical problems and demonstrates that a typical GM solves only 50% of the problems,
while an engine solves >90%, then it would be a valid claim.
Has this kind of experiment ever been documented? in fact it would cost only
1000-2000$ to have a few GMs solve some combos for a few hours. I wish I had
this kind of money sitting around . Anybody here is interested in financing the experiment?
None whatsoever. Elo ratings are defined in terms of results in games against
rated opponents. It is not possible to separate tactical skills from other
skills. It makes no sense to claim that a player (human or computer) "plays
at 3000 Elo tactically".
No, it still wouldn't. You cannot translate the percentage of correct solutions
in tactical problems to Elo ratings.
I don't think this kind of experiment would be very interesting. The results
would depend to a great extent on the nature of the positions chosen. The
tactical skills of GMs and computers are not easily comparable. Computers
are good at shallow, bushy tactics, while GMs excel at calculating extremely
deep combinations.
If you let top GMs and computers compete at solving WAC positions at 5
seconds/position, the computers would win easily. If you let the compete
at solving Nolot positions at 15 minutes/position, I would definitely bet
on the GMs.
Tord
Tord Romstad
 

Re: calibrating human tactics skills

Postby Uri Blass » 04 Jun 2004, 13:05

Geschrieben von:/Posted by: Uri Blass at 04 June 2004 14:05:35:
Als Antwort auf:/In reply to: Re: calibrating human tactics skills geschrieben von:/posted by: Tord Romstad at 04 June 2004 12:58:22:
What is the numerical (elo) evidence that the top engines play at 3000 elo tactically.
If one makes a test suite of purely tactical problems and demonstrates that a typical GM solves only 50% of the problems,
while an engine solves >90%, then it would be a valid claim.
Has this kind of experiment ever been documented? in fact it would cost only
1000-2000$ to have a few GMs solve some combos for a few hours. I wish I had
this kind of money sitting around . Anybody here is interested in financing the experiment?
None whatsoever. Elo ratings are defined in terms of results in games against
rated opponents. It is not possible to separate tactical skills from other
skills. It makes no sense to claim that a player (human or computer) "plays
at 3000 Elo tactically".
No, it still wouldn't. You cannot translate the percentage of correct solutions
in tactical problems to Elo ratings.
I don't think this kind of experiment would be very interesting. The results
would depend to a great extent on the nature of the positions chosen. The
tactical skills of GMs and computers are not easily comparable. Computers
are good at shallow, bushy tactics, while GMs excel at calculating extremely
deep combinations.
If you let top GMs and computers compete at solving WAC positions at 5
seconds/position, the computers would win easily. If you let the compete
at solving Nolot positions at 15 minutes/position, I would definitely bet
on the GMs.
Tord
I really see a big problem to define the tactical level of a player and the first thing is to compose the correct test for this purpose.
I think that correct test is if a player can see winning material and not if he can see sacrifice that may be seen for pure positional reasons.
The test should be to take 100 positions and ask humans and computers not to find the right move but to find the move that lead to the biggest material gain when you use 9 5 3 3 1 value for pieces.
I suggest that you take 50 position when there is no material gain that program can see even after hours of search and 50 position when there is a material gain but the material gain is not trivial and programs that are modified to calculate it usually cannot do it in less than a minute.
You can take for that purpose some random positions from comp-comp games that GM's do not know(of course the nolot positions are a bad idea because I expect GM's to know them).
Uri
Uri Blass
 

Re: calibrating human tactics skills

Postby Roger Brown » 04 Jun 2004, 16:34

Geschrieben von:/Posted by: Roger Brown at 04 June 2004 17:34:16:
Als Antwort auf:/In reply to: Re: calibrating human tactics skills geschrieben von:/posted by: Uri Blass at 04 June 2004 14:05:35:
The test should be to take 100 positions and ask humans and computers not to find the right move but to find the move that lead to the biggest material gain when you use 9 5 3 3 1 value for pieces.

I see a problem with your approach Uri. GM's have several methods of defining what a good tactic is. The mere gain of material is an important but not sole determinant of which method to use.
Irving Chernev once suggested that the brilliancies ought to be left up to Keres, Alekhine and their ilk. Lesser mortals should play the tactic that wins eventually if not brilliantly.
Dan Heisman, Novice Nook author, says that where possible, take the Queen - play the obvious tactic - and force an eventual resignation particularly where other tactical resolutions may contain issues of uncertainty.
There have been some masters - Alekhine to name one - who have played the brilliant move even where a less stellar move would do.
Then too there are those tactical decisions taken not so much to win as to confuse and stir things up in the mind of the opponent. Confusing the issue is a good idea where the opponent is human and other considerations - why would he/she play that move, there must be something in it....
Computers make wonderful poker players - no expressions to read.
Finally, GM's display different preferences depending on the phases of their lives (Capablanca was a brilliant tactical player in his youth but eventually matured to be the chess machine grinding down opponents in the end-game).
The point of the game is to win and measuring the tactical ability of GM's is a project fraught with a number of formidable obstacles. Material count sounds terrific for programs. I cannot see how it translates to human players particularly at the higher level.
Tal, Alekhine and Shirov have all engaged in extreme tactical solutions to complex practical situations on the chessboard. What is less well known is that, according to Tal, there are combinations which are engaged purely out of speculation (I think he said combiantions borne out of a hard life...). In other words it is a sacrifice, a combination with unclear consequences. Sometimes the goddess of sacrifices laughs.
Later.
Roger Brown
 

Re: calibrating human tactics skills

Postby Andrei Petcherski » 05 Jun 2004, 02:25

Geschrieben von:/Posted by: Andrei Petcherski at 05. June 2004 03:25:
Als Antwort auf:/In reply to: Re: calibrating human tactics skills geschrieben von:/posted by: Tord Romstad at 04 June 2004 12:58:22:
What is the numerical (elo) evidence that the top engines play at 3000 elo tactically.
If one makes a test suite of purely tactical problems and demonstrates that a typical GM solves only 50% of the problems,
while an engine solves >90%, then it would be a valid claim.
Has this kind of experiment ever been documented? in fact it would cost only
1000-2000$ to have a few GMs solve some combos for a few hours. I wish I had
this kind of money sitting around . Anybody here is interested in financing the experiment?
None whatsoever. Elo ratings are defined in terms of results in games against
rated opponents. It is not possible to separate tactical skills from other
skills. It makes no sense to claim that a player (human or computer) "plays
at 3000 Elo tactically".
No, it still wouldn't. You cannot translate the percentage of correct solutions
in tactical problems to Elo ratings.
I don't think this kind of experiment would be very interesting. The results
would depend to a great extent on the nature of the positions chosen. The
tactical skills of GMs and computers are not easily comparable. Computers
are good at shallow, bushy tactics, while GMs excel at calculating extremely
deep combinations.
If you let top GMs and computers compete at solving WAC positions at 5
seconds/position, the computers would win easily. If you let the compete
at solving Nolot positions at 15 minutes/position, I would definitely bet
on the GMs.
Tord

Tord,
I believe it is still possible to rate computer strength in various categories. Let's represent an engine rating as R=alpha*R(combo) + beta*R(bushy tactics)+ gamma*R(positional), where R(combo) is a strengh in forces combinations, "bushy tactics" is what you call all other tactics, except the forced combinations, and R(positional) is the positional strength.
because we don't know the absolute numbers for different Rs, we work them out from the ratings of the human players. For this we assume that the human rating can be represented the same way as above, and assign arbitrarily different Rs for an average GM to his elo rating. so a 2600 GM is equally strong in combo-tactics, bushy tactics, and positionally.
Next we carefully select 3 test-suites each testing only of the 3 parts comprising R. Run these test through a bunch of GMs and then thru computers. one should be able to see what engines are better at positonal play, combo-tactics, etc. more parameters could be added. I think this could give a good quantitative description to the playing styles of the engines, if nothing else.
Andrei Petcherski
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 21 guests