Winboard Forum

Posted: **24 Jun 2005, 10:07**

Hi all,

The source code for Glaurung BFP, a new and very experimental Glaurung version, is now available from my Scatha and Glaurung page. This version should not replace Glaurung 0.2.3 in serious tournaments, but of course I would be happy if somebody wants to run some tests with it.

Compared to 0.2.3, there are a number of minor changes, mostly small code simplifications which should have little or no influence on the playing strength. There is only one really important change: I have intentionally introduced an ugly and serious-looking bug in the futility pruning (BFP stands for "buggy futility pruning"). I'll explain this bug below.

One of the biggest jumps in playing strength for Gothmog (my old chess engine) came when I increased the evaluation grain size by dividing the return value of the evaluation function by 2. At the time, I was very surprised that this little change seemed to be so effective. Much later, I discovered that the increased strength was not caused by the increased eval grain size, but by a bug introduced at the same time: I forgot to divide the return value of my approximate_eval_after_move() function (used for futility pruning) by 2. In Gothmog, this bug seems to be worth about 50-100 rating points. All attempts to fix it were disastrous. I never understood why this bug improved the playing strength.

In Glaurung BFP, I have copied Gothmog's old bug to Glaurung. The futility pruning in Glaurung 0.2.3 and previous versions looks like this (somewhat simplified for the sake of readability):

Code: Select all: int futility_value = approx_eval_after_move(move); if(futility_value < alpha-futility_margin) continue; /* Don't search this move */

The approx_eval_after_move() function adds the approximate change in eval when the move is made to the static eval of the current position.

In Glaurung BFP, the first line of code above is changed:

Code: Select all: int futility_value = approx_eval_after_move(move) * 2;

This, of course, makes no sense at all. The program will make wildly inaccurate futility pruning decisions everywhere in the tree, and I would have expected to see lots of horrible blunders. For some reason which I still don't understand, this does not seem to happen. On the contrary, Glaurung BFP performs better than 0.2.3 in tactical test suites. I also played a Noomen match between the two versions, and BFP won by 54-46. Of course the winning margin is not sufficiently big to be sure that BFP is really stronger, but I think just the fact that BFP does not lose by a big margin is truly astonishing.

My hope is that if I some day manage to understand why this bug works, I will be able to use this understanding to invent some really cool new search tricks.

Tord

Posted: **24 Jun 2005, 10:46**

Hi Tord,

It can be that when multiplying by two the value of the evaluation costs more than futility takes place and with her the program plays worse, you have proven to deactivate futility full?, perhaps the turn out to face the new program the previous version has the same result or better than the obtained one now.

The one that the program is stronger in the test perhaps has its logic, according to which I have read futility can delay in some depth the good move, although will look for deeper.

Posted: **24 Jun 2005, 13:55**

Interesting finding.
Perhaps that proves that distance from a draw score is also important to deal with in addition to the distance to alpha.

I mean, when the evaluation score is close to 0, the difference between the two methods is probably negligable.

When you're bad, you need some luck in improving your position, most probably by adjusting the material balance a bit to you favor. Pruning moves that do not win material may perhaps enable to find some hidden material gain earlier. And when it fails, you were bad anyway so adjust alpha and hope you find a new move then.

When you're good, you probably don't need dodgy pruning techniques to improve further on your position tactically. You probably want to play as precise as possible.

An interesting test may be to multiply the evaluation score by two only when the evaluation score is < 0 ...

Richard.

Posted: **24 Jun 2005, 21:04**

Intel 'profile-guided' optimized build. (no opening book included)

http://homepages.tesco.net/henry.ablett ... ng_bfp.rar

Jim.

Posted: **25 Jun 2005, 09:40**

Hi Tord,

This version should not replace Glaurung 0.2.3 in serious tournaments

Are you sure?

Code: Select all: Arena tournament Rank Engine Author Country Score Gl Fr S-B 1 Glaurung 0.23 BFP Tord Romstad 3.5/4 ? ?? ?? 3.5/4 1.75 2 Fruit 2.1 (peach) Fabien LETOUZEY 0.5/4 0.5/4 ? ?? ?? 1.75 4 games played / Tournament finished Tournament start: 2005.06.24, 22:35:29 Latest update: 2005.06.25, 09:24:35 Site/ Country: TYRELL-S88VKCYV, United Kingdom Level: Tournament 40/5 Hardware: mobile AMD Athlon(tm) XP 2331 MHz with 511 MB Memory Operating system: Microsoft Windows XP Professional Service Pack 2 (Build 2600)

Games: http://homepages.tesco.net/henry.ablett ... uit2.1.zip

Regards,
Jim.

Posted: **25 Jun 2005, 11:09**

Richard Pijl wrote:Interesting finding.
Perhaps that proves that distance from a draw score is also important to deal with in addition to the distance to alpha.

Yes. I agree, the explanation of the good results with buggy futility pruning is probably related to this.

I mean, when the evaluation score is close to 0, the difference between the two methods is probably negligable.

Exactly.

When you're bad, you need some luck in improving your position, most probably by adjusting the material balance a bit to you favor. Pruning moves that do not win material may perhaps enable to find some hidden material gain earlier. And when it fails, you were bad anyway so adjust alpha and hope you find a new move then.

When you're good, you probably don't need dodgy pruning techniques to improve further on your position tactically. You probably want to play as precise as possible.

An interesting test may be to multiply the evaluation score by two only when the evaluation score is < 0 ...

I did a lot of such tests in Gothmog. I tried multiplying by two only for positive scores, and only for negative scores. I also tried factors bigger and smaller than two, as well as many different types of non-linear scales. I never found anything that worked better than a constant factor of 2, used for positive as well as negative scores.

Tord

Posted: **25 Jun 2005, 11:16**

Hi Jim,

Thanks for your Windows binary of Glaurung BFP, and for running the little test match against Fruit!

Jim Ablett wrote:
Tord Romstad wrote:This version should not replace Glaurung 0.2.3 in serious tournaments

Are you sure?

Code: Select all
Arena tournament Rank Engine Author Country Score Gl Fr S-B 1 Glaurung 0.23 BFP Tord Romstad 3.5/4 ? ?? ?? 3.5/4 1.75 2 Fruit 2.1 (peach) Fabien LETOUZEY 0.5/4 0.5/4 ? ?? ?? 1.75

Yes, I am sure. The 3.5-0.5 result against Fruit 2.1 is plain luck, nothing more. I think Fruit would score at least 70% in a long match. It is still very possible that Glaurung BFP is stronger than 0.2.3, though. I need more data before I can be sure of this. If BFP proves to be significantly stronger, there will probably be a new 0.2.4 version soon (probably containing a few evaluation improvements in addition to the futility pruning bug).

Tord

Posted: **25 Jun 2005, 13:05**

Tord,

You definitely found an improvement. How big of an improvement is now the question. I'll do some testing, and I'm sure others will too, and we'll soon see how much Glaurung improved.

Thanks to Jim for doing the Windows compile.

-Norm

Posted: **25 Jun 2005, 17:49**

Hello,
just some games, short(est) time control, but looks promising.
The word bug hunting might get a completely new meaning... ;-)

Code: Select all: 1: Glaurung023-bfp 28,0/59 ???????????????????? 2: CraftymCito12 12,0/20 0111=101===1100=01=1 3: Fruit_21 9,5/20 (!) 0011=1001=0===1=010= 3: Aristarch 4.50 9,5/19 1=01=1010101=001100 Level: Blitz 1min/game + 1sec/move Hardware: AMD Athlon(TM) XP 2000+

Peter

Posted: **25 Jun 2005, 18:50**

Norm and Peter,

First of all, thanks to both of you for your help!

Norm Pollock wrote:You definitely found an improvement.

How can you be so sure about this? The results posted in this thread so far are not enough to convince me.

Norm Pollock wrote:How big of an improvement is now the question. I'll do some testing, and I'm sure others will too, and we'll soon see how much Glaurung improved.

Peter Eizenhammer wrote:Hello,
just some games, short(est) time control, but looks promising.
The word bug hunting might get a completely new meaning...
Code: Select all
1: Glaurung023-bfp 28,0/59 ???????????????????? 2: CraftymCito12 12,0/20 0111=101===1100=01=1 3: Fruit_21 9,5/20 (!) 0011=1001=0===1=010= 3: Aristarch 4.50 9,5/19 1=01=1010101=001100

These are indeed promising results, but of course we need more data to be sure. I still find it hard to believe that this bug actually improves the playing strength.

Tord

Posted: **26 Jun 2005, 08:38**

Tord Romstad wrote:...but of course we need more data to be sure. I still find it hard to believe that this bug actually improves the playing strength.

Tord

Some data from my side:

Code: Select all: RANK ENGINE GAMES POINTS 1 2 --------------------------------------------------------------------- 1. FRUIT_21 10 6.0 ********** 101=01101= 2. GLAURUNG023-BFP 10 4.0 010=10010= ********** Total games = 10

Test conditions: XP2000+, ~64MB RAM, 5 piece TBs, Ponder off, [TimeControl "2700+5"].

To be continued.

Volker

Posted: **26 Jun 2005, 16:45**

An interesting test may be to multiply the evaluation score by two only when the evaluation score is < 0 ...

Hi Richard!
This final sentence confused me. What is the idea behind this? Unless you ment the approximation score for futility pruning.
In my point of view I agree with you and Pendro by the means that by multiplying aprox score by 2 Tord makes harder for futility pruning to prune moves when alpha is away from draw score. What is really the difference in strength in games (not test positions) when Glaurung plays with no futility pruning at all?

Posted: **26 Jun 2005, 20:12**

hello,
updated results:

Code: Select all: A) Glbfp: SlowBlitzWV 19,5/30 011=10100=11101001110111=11==1 TogaSetting 19,0/30 =11==01=1=111=0=00110111=1=10= CraftymCito12 17,5/31 0111=101===1100=01=10101=0110== Fruit_21 17,5/31 0011=1001=0===1=010=1110111=01= Aristarch 4.50 15,0/31 1=01=1010101=00110000011001==1= Spike09a 14,5/30 =01==00=10=1=010=1==1=1000=011 Glaurung023-bfp 80,0/183 ??????????????????????????????? => 43,7 % against very strong opponents B) compare with Gl023: 1: TogaSetting =1=111=1111010=11111== 2: Fruit_21 011=111101111===110111 3: SlowBlitzWV 01=101=1=111110111=1=1 4: Spike09a 0011=11=1=10==1110=101 5: Glaurung023Dann 23,5/88 ?????????????????????? => 26,7 % (versus the 41,7 % by Glbfp in the games in the first table);

Of course nothing is clear; but it looks like an improvement under these conditions,
when one takes into account that the "bug" worked for Gothmog, too,
what gives a little bit more confidence that it is ok.
But I am only guessing around,

Peter

Posted: **26 Jun 2005, 23:33**

This final sentence confused me. What is the idea behind this?

I just tried to figure out what would be more important:
- more pruning when behind (i.e. score <<0), trying to consider mainly the moves that win material
- less pruning when in front (i.e. score >> 0), trying to avoid any errors and play sound positionally
In his reply Tord mentioned that the unmodified 'bug' works best, so both seem to be productive.
Richard.

Posted: **26 Jun 2005, 23:46**

milix wrote:
An interesting test may be to multiply the evaluation score by two only when the evaluation score is < 0 ...

Hi Richard!
This final sentence confused me. What is the idea behind this? Unless you ment the approximation score for futility pruning.
In my point of view I agree with you and Pendro by the means that by multiplying aprox score by 2 Tord makes harder for futility pruning to prune moves when alpha is away from draw score.

No, it is not quite that simple. Mulitplying the score by 2 makes it harder to prune moves only when the score is above 0. When the score is below 0, multiplying by 2 makes it easier to prune moves.

What is really the difference in strength in games (not test positions) when Glaurung plays with no futility pruning at all?

The last time I checked (around version 0.1.3 or 0.1.4, I think), the difference seemed to be something like 40-50 Elo points. I guess the difference would be smaller today. Futility pruning tends to work best in programs with stupid and primitive evaluation functions. Glaurung's eval is still pretty bad, but not quite as bad as in 0.1.3/0.1.4.

Tord

Posted: **27 Jun 2005, 00:00**

Volker Pittlik wrote:
Tord Romstad wrote:...but of course we need more data to be sure. I still find it hard to believe that this bug actually improves the playing strength.

Tord

Some data from my side:

Code: Select all
RANK ENGINE GAMES POINTS 1 2 --------------------------------------------------------------------- 1. FRUIT_21 10 6.0 ********** 101=01101= 2. GLAURUNG023-BFP 10 4.0 010=10010= ********** Total games = 10

Test conditions: XP2000+, ~64MB RAM, 5 piece TBs, Ponder off, [TimeControl "2700+5"].

To be continued.

Thanks, Volker!

Based on the results posted so far, it looks like the bug is particularly effective against Fruit 2.1. Probably just another statistical fluctuation.

Peter Eizenhammer wrote:hello,
updated results:
Code: Select all
A) Glbfp: SlowBlitzWV 19,5/30 011=10100=11101001110111=11==1 TogaSetting 19,0/30 =11==01=1=111=0=00110111=1=10= CraftymCito12 17,5/31 0111=101===1100=01=10101=0110== Fruit_21 17,5/31 0011=1001=0===1=010=1110111=01= Aristarch 4.50 15,0/31 1=01=1010101=00110000011001==1= Spike09a 14,5/30 =01==00=10=1=010=1==1=1000=011 Glaurung023-bfp 80,0/183 ??????????????????????????????? => 43,7 % against very strong opponents B) compare with Gl023: 1: TogaSetting =1=111=1111010=11111== 2: Fruit_21 011=111101111===110111 3: SlowBlitzWV 01=101=1=111110111=1=1 4: Spike09a 0011=11=1=10==1110=101 5: Glaurung023Dann 23,5/88 ?????????????????????? => 26,7 % (versus the 41,7 % by Glbfp in the games in the first table);

Of course nothing is clear;

In fact it is rather clear, at least for the time controls you use. Entering the above data into R?mi Coloum's excellent little MonteCarlo utility (thanks, R?mi!

), I get a probability of 99.6% that BFP is stronger than 0.2.3. Pretty convincing.

Peter Eizenhammer wrote:but it looks like an improvement under these conditions, when one takes into account that the "bug" worked for Gothmog, too, what gives a little bit more confidence that it is ok.

That the bug works in Gothmog doesn't really say much. The two programs are very different, and it often happens that something works very well in one of the engines and not at all in the other. I always assumed that the futility pruning bug worked in Gothmog because of some bizzarre interactions with other, more subtle bugs which I was never able to find (Gothmog is very complicated and very buggy). That the same bug appears to work in the relatively clean and bug-free Glaurung is a big surprise to me.

Thank you very much for your help! By the way, I am currently also working on some of the evaluation problems you pointed out some time ago. Hopefully some of the weaknesses will be slightly less noticable in the next version.

Tord

Posted: **27 Jun 2005, 02:09**

I did some testing. Based on these 100 games, it appears that BFP is stronger than 0.2.3. And that BFP is competitive with some strong engines.
These games were on an AMD 3000+, XP, with time 4'+2", and 128M hash each engine, no ponder, same book.

1 Gandalf 6.0 12.0/20
2 Glaurung BFP 8.0/20

1 Pro Deo 1.1 10.5/20
2 Glaurung BFP 9.5/20

1 TheKing 3.33 13.0/20
2 Glaurung BFP 7.0/20

1 SOS 5 for Arena 10.5/20
2 Glaurung BFP 9.5/20

1 Glaurung BFP 15.5/20
2 Glaurung 0.2.3 4.5/20

Posted: **27 Jun 2005, 10:31**

Tord Romstad wrote:...
Based on the results posted so far, it looks like the bug is particularly effective against Fruit 2.1. Probably just another statistical fluctuation...

The match is finished now. Fruit won 20:8. Steve Maughan's "Who-is-better" tool indicates that Fruit is by an error margin of 5%. All games can be found here: http://www.volker-pittlik.name/winboard/tests/Fruit_21-Glaurung023-bfp_45_5.zip.

Glaurung's most impressive win:

[Event "Computer chess game"]
[Site "XP2000+, ~64MB RAM, 5 piece TBs, Ponder off, VPittlik"]
[Date "2005.06.26"]
[Round "15"]
[White "Fruit_21"]
[Black "Glaurung023-bfp"]
[Result "0-1"]
[TimeControl "2700+5"]

1. e4 e5 2. Nf3 Nc6 3. Bc4 Nf6 4. Ng5 Bc5 5. Bxf7+ Ke7 6. Bd5 Rf8 7. O-O
d6 {-1.10/14} 8. h3 h6 {-0.77/14} 9. Nf3 {+0.49/14} Nd4 {-0.44/15} 10.
Nxd4 {+0.43/14} Bxd4 {-0.38/16} 11. Nc3 {+0.59/15} Qe8 {-0.07/15} 12.
d3 {+0.59/13} Qg6 {-0.08/14} 13. Kh2 {+0.48/12} Nh5 {-0.08/13} 14.
Bc4 {+0.44/12} c6 {+0.10/13} 15. Bd2 {+0.12/12} Bxf2 {+0.70/13} 16.
g4 {-0.50/13} Qf6 {+1.53/13} 17. g5 {-1.26/13} Qg6 {+1.81/12} 18.
Ne2 {-1.26/11} Rf3 {+2.66/12} 19. h4 {-1.84/11} Bg4 {+4.58/11} 20.
c3 {-4.72/12} Raf8 {+4.90/10} 21. Qc2 {-5.38/10} b5 {+6.61/11} 22.
Bb3 {-3.65/10} d5 {+8.05/12} 23. Bxd5 {-8.10/10} cxd5 {+10.05/11} 24.
c4 {-10.38/8} Rh3+ {+99.81/10} 25. Kg2 {-12.81/4} Qf7 {+99.79/9} 26.
Nd4 {-7.37/8} Rg3+ {+99.81/8} 27. Kh2 {-99.88/10} exd4 {+99.83/7} 28.
Bb4+ {-99.90/47} Kd7 {+99.85/6} 29. Qe2 {-99.92/47}
{White resigns} 0-1

Regards

Volker

Posted: **28 Jun 2005, 08:39**

Glaurung BFP had problems with time?! when remain 20 secs, it begin to play so bad.

trn 2 min/game

Code: Select all: glaurung ???? ???????? Fru Gla Gla Gla Points S-B % 1 Fruit 2.0 XXXX 11?? 1110 110? 8.5 / 12 45.00 70.83% 2 Glaurung 0.2.1 00?? XXXX 0110 11?1 6.5 / 12 31.00 54.17% 3 Glaurung 0.2.3 0001 1001 XXXX 1??1 6.0 / 12 30.50 50.00% 4 Glaurung BFP 001? 00?0 0??0 XXXX 3.0 / 12 22.00 25.00% 24 ?????? ?? 24 ??????? ???????: 2 ?????/?????? ??????: C:\Chess\Shredder\trn\glaurung.pgn www.shredderchess.com

if need games, mail me

Posted: **28 Jun 2005, 08:52**

Hi all,

I haven't got the time to re-read the whole thread, but I am not sure if it is clearly stated anywhere that Glaurung uses EXTENDED Futility Pruning. It took me a while to guess this (this was confirmed afterwards).

Fabien.

Winboard Forum

Glaurung BFP

Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP

Re: Glaurung BFP