Postby Gian-Carlo Pascutto » 18 Jul 2005, 08:59

Hi all,

what good, cleaned tactical testsets are available?

I published ECM-GCP quite a while ago, and I'm basically looking for something similar, or the same, but a bit more cleaned out. Surely in the meantime some more cooks must have been found there, or someone else must have come up with something similar.

Any hints?
Re: Tactical testset

Postby Anonymous » 18 Jul 2005, 15:14

Gian-Carlo Pascutto wrote:Hi all,

what good, cleaned tactical testsets are available?

I published ECM-GCP quite a while ago, and I'm basically looking for something similar, or the same, but a bit more cleaned out. Surely in the meantime some more cooks must have been found there, or someone else must have come up with something similar.

Any hints?

Here are 50 positions from the Rebel tactics contest of a few years ago. I don't know if it is clean, but knowing Ed, i assume it is.

1r4k1/p4ppp/3P3q/4P2P/3b1P2/1r6/pPQ3B1/K1B2R2 b - - bm h6e6; id "Rebel Contest 1";
1r1qr2k/1p4pp/3p4/1R1B1p2/1pPp4/4P1Pb/1P3P1P/Q3R1K1 w - - bm b5b4; id "Rebel Contest 2";
r6k/1p4pp/3p4/1R1B1p2/1pPP4/6Pb/1P2qP1P/2Q3K1 w - - bm d5g2; id "Rebel Contest 3";
2rqk2r/pb1nbp1p/4p1p1/1B1n4/Np1N4/7Q/PP3PPP/R1B1R1K1 w k - bm e1e6; id "Rebel Contest 4";
r1bq1rk1/3nbppp/p2pp3/6PQ/1p1BP2P/2NB4/PPP2P2/2KR3R w - - bm d4g7; id "Rebel Contest 5";
r1bq1rk1/pp4bp/2np4/2p1p1p1/P1N1P3/1P1P1NP1/1BP1QPKP/1R3R2 b - - bm c8h3; id "Rebel Contest 6";
8/8/3k1p2/p2BnP2/4PN2/1P2K1p1/8/5b2 b - - bm e5d3; id "Rebel Contest 7";
8/3R4/p1k1B1p1/1pr5/P4B2/4K3/1Pr5/8 w - - bm d7c7; id "Rebel Contest 8";
r2q1rk1/pp2p1bp/2n1Ppp1/2pn4/3pNP2/6P1/PPPPQ2P/RNB2RK1 b - - bm d4d3; id "Rebel Contest 9";
3r4/2r5/p3nkp1/1p3p2/1P1pbP2/P2B3R/2PRN1P1/6K1 b - - bm c7c3; id "Rebel Contest 10";
2krr3/pppb1ppp/3b4/3q4/3P3n/2P2N1P/PP2B1P1/R1BQ1RK1 b - - bm h4g2; id "Rebel Contest 11";
5r1k/1P4pp/3P1p2/4p3/1P5P/3q2P1/Q2b2K1/B3R3 w - - bm a2f7; id "Rebel Contest 12";
4r2k/p2qr1pp/1pp2p2/2p1nP1N/4R3/1P1P2RP/1PP2QP1/7K w - - bm g3g7; id "Rebel Contest 13";
r1b1n2r/1p2q3/1Qp1npk1/4p1p1/P1B1P3/2P1BNP1/1P3PK1/R3R3 b - - bm e6f4; id "Rebel Contest 14";
2r2rk1/pb1q1ppp/1p3n2/5N2/4pP2/P3P3/1B2QRPP/R5K1 w - - bm f5g7; id "Rebel Contest 15";
rq4k1/pp1nrppp/4bn2/6R1/3QP3/P4PN1/4B1PP/2B2RK1 w - - bm g5g7; id "Rebel Contest 16";
2rr2k1/1b3ppp/pb2p3/1p2P3/1P2BPnq/P1N3P1/1B2Q2P/R4R1K b - - bm c8c3; id "Rebel Contest 17";
2r2rk1/p3bb1p/2n1Q1p1/q2pP3/3P1P2/p1NB1NR1/1P4P1/1K1R4 w - - bm d3g6; id "Rebel Contest 18";
4r1k1/1p3pp1/1Npr3p/3p3q/3P2n1/4P1PP/PP3PK1/1RR1Q3 b - - bm g4f2; id "Rebel Contest 19";
r3rn1k/1b3Qp1/2p2b2/qp1pN3/3P4/2N1P2P/PP3P2/2KR2R1 w - - bm g1g7; id "Rebel Contest 20";
2r1r1k1/pp1bb2p/3pppp1/q6n/4P2Q/2N1BP1P/PPP1BP2/2KR2R1 w - - bm e2b5; id "Rebel Contest 21";
6r1/2rp1kpp/2qQp3/p3Pp1P/1pP2P2/1P2KP2/P5R1/6R1 w - - bm g2g7; id "Rebel Contest 22";
rn2r1k1/pp2bp1p/3np1pP/2q5/5N2/1BP2Q2/PP1B1PP1/2KR3R w - - bm f4e6; id "Rebel Contest 23";
2r2rk1/1p3pp1/p2qpn1p/4N3/3P4/2PQ4/PP4PP/4RRK1 w - - bm f1f6; id "Rebel Contest 24";
r7/8/4p2k/3p1bR1/7r/Np6/1P6/K5R1 w - - bm g5h5; id "Rebel Contest 25";
b2r3r/k4p1p/p2q1np1/NppP4/3p1Q2/P4PPB/1PP4P/1K1RR3 w - - bm d1d4; id "Rebel Contest 26";
2rqrnk1/pp3pb1/6p1/4pbP1/3pN2R/3B1N2/PPP2P2/2K2Q1R w - - bm h4h8; id "Rebel Contest 27";
r3b1nr/ppqn1k1p/4p1p1/1P1pPpP1/1B1N1P1P/R7/3Q4/R3KB2 w Q - bm a3c3; id "Rebel Contest 28";
2n2r1k/3b3p/1p2p3/pP1p4/P1rPNq1P/8/2Q2PR1/B3R1K1 w - - bm c2c4; id "Rebel Contest 29";
8/7p/p2p4/P7/1p2k1P1/1P2p2r/2P1R1KP/8 b - - bm h3h2; id "Rebel Contest 30";
8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - - bm b3b2; id "Rebel Contest 31";
rr4k1/3npp2/3p1npQ/q1pP3p/4P2P/P1N2N2/RP3PP1/6KR b - - bm b8b2; id "Rebel Contest 32";
1r6/5kp1/RqQb1p1p/1p1PpP2/1Pp1B3/2P4P/6P1/5K2 b - - bm b6e3; id "Rebel Contest 33";
3r4/8/2P2p2/1K3k2/8/6p1/8/2R5 b - - bm f5g4; id "Rebel Contest 34";
7K/6p1/8/7p/8/8/R7/6k1 w - - bm h8h7; id "Rebel Contest 35";
4Q3/1kp5/3r4/3PK3/8/8/8/8 w - - bm e8e6; id "Rebel Contest 36";
8/7p/1r5R/2k5/8/6PK/6P1/8 w - - bm h6b6; id "Rebel Contest 37";
2rq2k1/4bppp/p1rp4/1p1NpP2/4P3/2PQ4/PP4PP/3R1R1K w - - bm d1a1; id "Rebel Contest 38";
r4rk1/pp1n1p1p/1nqP2p1/2b1P1B1/4NQ2/1B3P2/PP2K2P/2R5 w - - bm c1c5; id "Rebel Contest 39";
r1b2rk1/1p1nbppp/pq1p4/3B4/P2NP3/2N1p3/1PP3PP/R2Q1R1K w - - bm f1f7; id "Rebel Contest 40";
8/6pk/1r5p/1q1p1p1Q/3PbP2/b1p3PP/2P2B1K/1B2R3 b - - bm b5b1; id "Rebel Contest 41";
r2qr2k/pbp3pp/1p2Bb2/2p5/2P2P2/3R2P1/PP2Q1NP/5RK1 b - - bm d8d3; id "Rebel Contest 42";
2r2bk1/4qp2/3n2p1/2R1p1Np/2p1N3/r6P/1Q3PP1/3R2K1 w - - bm c5c8; id "Rebel Contest 43";
1nr1b1k1/r4p2/1p1q2p1/pPppN3/3P4/2RBP3/P1Q2PP1/2R3K1 w - - bm d4c5; id "Rebel Contest 44";
3B4/1p4k1/6B1/P2npp1P/6b1/4P3/6K1/8 w - - bm g2g3; id "Rebel Contest 45";
rn3rk1/1b3ppp/pq2p3/1p1pP3/3RNP2/4Q3/PPP1B1PP/1K5R w - - bm e4f6; id "Rebel Contest 46";
4r3/Rp2rppk/2pQ3p/1qNp4/3P4/4P1P1/5P1P/Rb4K1 b - - bm e7e3; id "Rebel Contest 47";
r2q1rk1/pb3ppp/2n5/1pbNP3/2p1Q3/P4N2/BP3PPP/R4RK1 w - - bm d5f6; id "Rebel Contest 48";
6k1/8/p3r2p/1p1pPpp1/1n1pP3/1Pq4P/PR4PB/3QN1K1 b - - bm d5e4; id "Rebel Contest 49";
r1bq1rk1/1p2bppp/p3p3/n3P3/4N3/1P1P1N2/PB4PP/R3QR1K w - - bm e4f6; id "Rebel Contest 50";


Re: Tactical testset

Postby Joachim Rang » 18 Jul 2005, 17:34

David Dahlem wrote:
Here are 50 positions from the Rebel tactics contest of a few years ago. I don't know if it is clean, but knowing Ed, i assume it is.



no it is not. First they are of very different difficulty level and then there are simply wrong solutions. For example No. 4, Rxe6 is not the strongest move but Bh6. I'm sure there are some other incorrect ones.

Most surprisingly I didn't found good tactical test suites either. I have started to pick correct and comparable positions (difficulty level similiar to each other) from different test suites but this is a very time-cosuming process.

regards Joachim
Re: Tactical testset

Postby Ron Murawski » 18 Jul 2005, 18:46

Jim Monaghan has posted the IQ testsuite and the O'Kelly testsuite. IQ has been extensively debugged.
IQ5 testsuite
O'Kelly testsuite

You also can find some testsuites on Jon Dart's site.

Re: Tactical testset

Postby Joachim Rang » 18 Jul 2005, 20:23

Ron Murawski wrote:Jim Monaghan has posted the IQ testsuite and the O'Kelly testsuite. IQ has been extensively debugged.
IQ5 testsuite
O'Kelly testsuite

You also can find some testsuites on Jon Dart's site.


they are both very easy, but maybe mostly correct. I would like to have some positions for 10-30 seconds on modern hardware.

regards Joachim
Re: Tactical testset

Postby Volker Pittlik » 18 Jul 2005, 20:47

Joachim Rang wrote:...

they are both very easy, but maybe mostly correct. I would like to have some positions for 10-30 seconds on modern hardware.

regards Joachim

Maybe Arasan.5 is OK for you:

Re: Tactical testset

Postby Dann Corbit » 18 Jul 2005, 21:07

Gian-Carlo Pascutto wrote:Hi all,

what good, cleaned tactical testsets are available?

I published ECM-GCP quite a while ago, and I'm basically looking for something similar, or the same, but a bit more cleaned out. Surely in the meantime some more cooks must have been found there, or someone else must have come up with something similar.

Any hints?

Every test set has a lot of bugs. The most thoroughly debugged is WAC, but it is too easy for most engines.
ECM-GCP is probably second best.

Every test set I examine will uncover flaws.

I just found a bunch in MES, and also some in Jenoban.

Here is how to find out if a test set is reliable:
Take at least 5 top engines and let them analyze every position you intend to use for an hour with big hash on a fast machine.
Then, look at every position's result for every engine.
Even if 4/5 agree, look at the disagreement. It may be a better solution or one as good or nearly as good.
Dann Corbit

Re: Tactical testset

Postby Dann Corbit » 18 Jul 2005, 21:21

For these positions:
1R1K4/8/3n4/1b6/1P6/8/8/7k w - - bm Rb6; id "(JenoBan_TofE).189"; #226 white_wins
6k1/1B5p/1p6/3n4/3r4/7P/5KP1/2R5 w - - bm Rc4; id "(JenoBan_TofE).190"; #227 white_wins
1r3k2/8/3b2p1/3P4/1p6/6P1/K4B2/7R w - - bm Bc5; id "(JenoBan_TofE).191"; #228 white_wins
8/8/5P1k/8/2K5/pr2r3/4R3/2R5 w - - bm f7; id "(JenoBan_TofE).192"; #229 white_wins
3r2r1/7k/8/7K/8/8/8/3RR3 w - - bm Re7+; id "(JenoBan_TofE).193"; #230 white_wins
8/8/B7/3K4/8/6p1/7k/5R2 w - - bm Rf3; id "(JenoBan_TofE).194"; #231 white_wins
6k1/4K3/6PP/8/7B/8/8/7r w - - bm h7+; id "(JenoBan_TofE).195"; #232 white_wins
5k2/4p2p/6P1/3K4/8/4B3/8/8 w - - bm Bh6+; id "(JenoBan_TofE).196"; #233 white_wins
1NBk4/p2p4/8/3K4/8/8/8/8 w - - bm Bb7; id "(JenoBan_TofE).197"; #234 white_wins
8/4q3/2p5/2n2Q2/p5N1/7k/8/1K6 w - - bm Ne3+; id "(JenoBan_TofE).198"; #235 white_wins
8/3b2Bq/8/7p/6k1/8/6K1/Q7 w - - bm Qd4+; id "(JenoBan_TofE).199"; #236 white_wins
R7/6N1/8/4p3/4k3/6n1/7b/3K4 w - - bm Ra2; id "(JenoBan_TofE).200"; #237 white_wins
8/3b4/k2p3P/1p1K3P/1P4r1/8/3R4/8 w - - bm h7; id "(JenoBan_TofE).201"; #238 white_wins
5q2/n2P1k2/2b5/8/8/3N4/4BK2/6Q1 w - - bm Qg5; id "(JenoBan_TofE).202"; #239 white_wins

Dann Corbit

Re: Tactical testset

Postby Tord Romstad » 18 Jul 2005, 23:13

Dann Corbit wrote:
Gian-Carlo Pascutto wrote:Hi all,

what good, cleaned tactical testsets are available?

I published ECM-GCP quite a while ago, and I'm basically looking for something similar, or the same, but a bit more cleaned out. Surely in the meantime some more cooks must have been found there, or someone else must have come up with something similar.

Any hints?

Every test set has a lot of bugs. The most thoroughly debugged is WAC, but it is too easy for most engines.
ECM-GCP is probably second best.

Even ECM-GCP is beginning to get a bit too easy on modern hardware.

Besides, WAC, ECM-GCP and all other tactical test suites I have seen suffer from one very fundamental flaw: It is far too biased towards flashy, spectacular tactics of the type that usually win brilliancy prizes in human tournaments, and I am not sure to what extent the ability to solve such positions correlate to the ability to find the more mundane tactics which decide most games. Our current tactical test suites are not only a poor tool for testing an engine's strength, they are not even very good for measuring an engine's tactical abilities.

Glaurung performs better than most amateur engines at tactical test suites like ECM-GCP (which is a complete mystery to me, because my use of extensions is rather conservative), but in engine vs engine matches it seems to be out-tacticed more often than not, even against opponents 100 Elo points weaker.

This is not meant as a criticism of Gian-Carlo's work, by the way. I couldn't have done it better myself. Making a good tactical test suite is just a really hard task.

Re: Tactical testset

Postby Ross Boyd » 18 Jul 2005, 23:40

Hi Dann,

I just found a bunch in MES, and also some in Jenoban.

Yes, I've found there are many cooks in MES400 as proven by use of egtbs. Also, one position where the solution is the only legal move :D... - that particular position is only interesting for the evaluation score.

Is there a corrected version available on your site by any chance? I'd be interested in making comparisons.

btw, it would be useful to design a testsuite which has extra conditions like "score must exceed x" or "score must be a draw" before the engine's solution is accepted as correct. This would add some credence that the engine actually 'understands' the position.


Re: Tactical testset

Postby Uri Blass » 19 Jul 2005, 01:06

Tord Romstad wrote:
Dann Corbit wrote:
Gian-Carlo Pascutto wrote:Hi all,

what good, cleaned tactical testsets are available?

I published ECM-GCP quite a while ago, and I'm basically looking for something similar, or the same, but a bit more cleaned out. Surely in the meantime some more cooks must have been found there, or someone else must have come up with something similar.

Any hints?

Every test set has a lot of bugs. The most thoroughly debugged is WAC, but it is too easy for most engines.
ECM-GCP is probably second best.

Even ECM-GCP is beginning to get a bit too easy on modern hardware.

Besides, WAC, ECM-GCP and all other tactical test suites I have seen suffer from one very fundamental flaw: It is far too biased towards flashy, spectacular tactics of the type that usually win brilliancy prizes in human tournaments, and I am not sure to what extent the ability to solve such positions correlate to the ability to find the more mundane tactics which decide most games. Our current tactical test suites are not only a poor tool for testing an engine's strength, they are not even very good for measuring an engine's tactical abilities.

Glaurung performs better than most amateur engines at tactical test suites like ECM-GCP (which is a complete mystery to me, because my use of extensions is rather conservative), but in engine vs engine matches it seems to be out-tacticed more often than not, even against opponents 100 Elo points weaker.

This is not meant as a criticism of Gian-Carlo's work, by the way. I couldn't have done it better myself. Making a good tactical test suite is just a really hard task.


I think that it is easy to do it better than ECM-GCP and you only need time for it.
The way to do it is simply to look at computer games and find tactical mistakes(based on evaluation after search before the move and after the move).

positions with tactical mistakes are candidate to be test positions.
Later you can give top programs to analyze the candidates for a lot of time in 2-best mode and if you find that only one move is winning then you can include it in the test.

Re: Tactical testset

Postby Uri Blass » 19 Jul 2005, 01:10

Dann Corbit wrote:
Gian-Carlo Pascutto wrote:Hi all,

what good, cleaned tactical testsets are available?

I published ECM-GCP quite a while ago, and I'm basically looking for something similar, or the same, but a bit more cleaned out. Surely in the meantime some more cooks must have been found there, or someone else must have come up with something similar.

Any hints?

Every test set has a lot of bugs. The most thoroughly debugged is WAC, but it is too easy for most engines.
ECM-GCP is probably second best.

Every test set I examine will uncover flaws.

I just found a bunch in MES, and also some in Jenoban.

Here is how to find out if a test set is reliable:
Take at least 5 top engines and let them analyze every position you intend to use for an hour with big hash on a fast machine.
Then, look at every position's result for every engine.
Even if 4/5 agree, look at the disagreement. It may be a better solution or one as good or nearly as good.

I think that there is a better way.

give one top program to analyze the position in 2-best mode.

If you see based on the scores that only one move is winning then the position is probably correct.
If you see based on the scores 2 winning moves or 2 drawing moves or everything is losing then you know that there is a mistake.

Re: Tactical testset

Postby Dann Corbit » 19 Jul 2005, 01:42

All of these approaches only give probability based guesses as to what is a good test set.

If you look at the Jenoban result log that I posted you will see that some of the strongest programs clearly misevaluated a position after one full hour of search on a very fast machine with a great big hash table. If they can't even get the best move right, then what about second best?

The only way to know that a test suite is fully ensured is to analyze it all the way to checkmate or draw exhaustively.

So every other approach that we try will always be a guess.
Dann Corbit

