Winboard Forum

by **OliverBr** » 30 Oct 2004, 13:44

Uri Blass wrote:Maybe the trick is to write long code because with less than hundreds of lines you cannot be very fast.
Uri

I do not see the connection between code length and speed.
Shouldn't it be vice versa? Less code, less instructions, the faster
the CPU goes through?! When I was programming assembler,
is was like this!

There is no doubt that larger code gets better tree cutting and better
evaluations, but especially in perft with non of these, a short and
compact code should be the fastest.

-Oliver

by **Reinhard Scharnagl** » 30 Oct 2004, 13:54

Hi Oliver and Michael,

to be comparable (without updating the complicated statistic) Smirf using only one 64 bit counter gets (P4, 2.8 GHz):

Code: Select all: +-*--b--c--d--*--f--g--*-+ MS Vis. Studio C++ Vers. 13.10 8 |[r][n][b][q][k][b][n][r]| (Compilation: Oct 30 2004) 7 |[p][p][p][p][p][p][p][p]| 6 | ::: ::: ::: :::| Perft Testseries 5 |::: ::: ::: ::: | 4 | ::: ::: ::: :::| (without caching) 3 |::: ::: ::: ::: | 2 || Smirf Test No.: 00 1 |<R><N><Q><K><N><R>| =>+-*--b--c--d--*--f--g--*-+ Break Time 50.0 Sec. Ply Nodes all (x) (e.p.) all (+) all (++) Promot. Castl. Sec. ------------------------------------------------------------------------------- 1 20 0 0 0 0 0 0 0 2 400 0 0 0 0 0 0 0 3 8902 0 0 0 0 0 0 0 4 197281 0 0 0 0 0 0 0 5 4865609 0 0 0 0 0 0 0.2 6 119060324 0 0 0 0 0 0 4.2 7 3195901860 0 0 0 0 0 0 106 -------------------------------------------------------------------------------

These are 30.15 million legal and fully informed moves generated a second for the traditional starting position (though supporting FRC and 10x8 boards).
Regards, Reinhard

by **Uri Blass** » 30 Oct 2004, 14:05

Hi Oliver,The connection is simple.

Having more functions instead of general code make the code faster
For example I discovered that having a special generator for white pawns and for black pawns make the code faster because I do not need to check for every pawn if it is white or black and the if condition costs time.

I found that having functions for many types of moves also helped me to do the generator faster.

Note also that I do not know nothing about assembler and I use only C or C++ and I suspaect that bitboard is not the best idea for getting a faster move generator.

Uri

by **Reinhard Scharnagl** » 30 Oct 2004, 14:11

Hi Uri,

I do fully agree with your statement. And additionally telling a small secret: Smirf still is not completely split into black and white routines ...

Reinhard.

by **OliverBr** » 30 Oct 2004, 18:30

Hello Tord,
thank you for your work.
Unfortunately getShiftMask is the worst function you can pick, because
it is only called few times for initialization and only contains
32bit integers.

void upDateBoards(Move move) or when this is too large, just
void doPromotion(int piece, int from) would be much more interesting.

Tord Romstad wrote:Here is the getShiftMask function, first in the 32-bit compile, then in the 64-bit compile:

by **Anonymous** » 30 Oct 2004, 19:05

Tord Romstad wrote:
The 32-bit binary was compiled with 'gcc -O3 -funroll-loops -fomit-frame-pointer -o olithink', while the 64-bit binary was compiled with 'gcc -O3 -funroll-loops -fomit-frame-pointer -fast -mcpu=G5 -mpowerpc64 -o olithink64'.

Tord

Almost all my programs become slower with these compile flags. For me typically -O (or -O2) -fomit-frame-pointer is best. One can even see it in the generated assembly, how often -O3 gives worse code. For example due to over aggressive inlining. The inlined function often has less registers (because one or a few are taken by the caller), and now some inner loop code can easily become less efficient. The calling overhead can be much less compared to this. Also gcc likes to internally produce pointer code (*a++ = *b++ * ... when one writes a[i] = b[i] + ...). Sometimes this needs more registers, often it needs more instructions (the pointer increments), compared to the indexed code (with more complicated adressing) , which is produced at lower optimization levels, and often is better. Of course it might be very different on PowerPC (More registers, I guess no indexed adressing modes, etc. )

BTW. I always wondered, why compilers don't default for sensible optimization settings. Compilation times can hardly be an argument anymore - at least not for medium sized projects. Intel cc is the only exception I know - it produces fast code without any switches.

Regards,
Dieter

by **Pallav Nawani** » 30 Oct 2004, 19:06

OliverBr wrote:
BTW, I am happy that the Persians when they invented chess,
did consider the problem of alignement with computer registers,
so they choose wisely a 8x8 board.

-Oliver

Persians did not Invent chess. It was invented in India.

http://www.geocities.com/pallavnawani/chess-history.html
http://en.wikipedia.org/wiki/Chess

Best regards,
Pallav

by **Tord Romstad** » 31 Oct 2004, 21:24

OliverBr wrote:Hello Tord,
thank you for your work.
Unfortunately getShiftMask is the worst function you can pick, because
it is only called few times for initialization and only contains
32bit integers.

Of course you are right. I should have checked a bit more carefully before posting.

void upDateBoards(Move move) or when this is too large, just
void doPromotion(int piece, int from) would be much more interesting.

upDateBoards is indeed a bit too large, but doPromotion is included below (32 bit-version first, 64-bit version at the end). This time the 64-bit version is slightly shorter, and the 32-bit version contains pairs of 'xor' instructions at locations where the 64-bit version has only a single 'xor'. Unfortunately it seems that most of the processor time is spent doing the numerous instructions between the xors.

32-bit compile:

Code: Select all: _doPromotion: mflr r9 bcl 20,31,"L00000000012$pb" "L00000000012$pb": stmw r28,-16(r1) mflr r31 addis r5,r31,ha16(L_PieceBoard$non_lazy_ptr-"L00000000012$pb") addis r7,r31,ha16(L_bitMask$non_lazy_ptr-"L00000000012$pb") lwz r2,lo16(L_PieceBoard$non_lazy_ptr-"L00000000012$pb")(r5) slwi r8,r3,3 lwz r0,lo16(L_bitMask$non_lazy_ptr-"L00000000012$pb")(r7) slwi r6,r4,3 add r5,r8,r2 mtlr r9 add r28,r6,r0 mr r29,r2 addi r30,r5,56 addi r0,r2,168 L853: lwz r9,8(r29) lwz r10,12(r29) addi r8,r29,56 lwz r2,0(r28) lwz r3,4(r28) lwz r11,512(r28) lwz r12,516(r28) addi r28,r28,1024 xor r6,r9,r2 xor r7,r10,r3 stw r6,8(r29) stw r7,12(r29) addi r29,r29,112 lwz r9,-56(r30) lwz r10,-52(r30) cmpw cr0,r29,r0 xor r4,r9,r2 xor r5,r10,r3 stw r4,-56(r30) stw r5,-52(r30) lwz r2,8(r8) lwz r3,12(r8) xor r6,r2,r11 xor r7,r3,r12 stw r6,8(r8) stw r7,12(r8) lwz r2,0(r30) lwz r3,4(r30) xor r4,r2,r11 xor r5,r3,r12 stw r4,0(r30) stw r5,4(r30) addi r30,r30,112 ble- cr0,L853 lmw r28,-16(r1) blr

64-bit compile:

Code: Select all: _doPromotion: lis r2,ha16(L_PieceBoard$non_lazy_ptr) lis r5,ha16(L_bitMask$non_lazy_ptr) slwi r6,r4,3 slwi r3,r3,3 lwz r12,lo16(L_PieceBoard$non_lazy_ptr)(r2) lwz r10,lo16(L_bitMask$non_lazy_ptr)(r5) ld r8,8(r12) ldx r9,r6,r10 add r4,r6,r10 addi r5,r12,56 addi r6,r3,56 ld r10,512(r4) xor r7,r8,r9 ld r8,1024(r4) std r7,8(r12) nop ld r7,1536(r4) ldx r2,r3,r12 xor r0,r2,r9 stdx r0,r3,r12 ld r4,8(r5) xor r9,r4,r10 addi r4,r12,112 std r9,8(r5) addi r9,r3,112 ldx r5,r6,r12 xor r0,r5,r10 stdx r0,r6,r12 ld r10,8(r4) xor r2,r10,r8 addi r10,r12,168 std r2,8(r4) addi r2,r3,168 ldx r4,r9,r12 xor r6,r4,r8 stdx r6,r9,r12 ld r5,8(r10) xor r0,r5,r7 std r0,8(r10) nop ldx r4,r2,r12 xor r3,r4,r7 stdx r3,r2,r12 blr

Tord

by **Tord Romstad** » 31 Oct 2004, 21:33

Dieter B?r?ner wrote:
Tord Romstad wrote:
The 32-bit binary was compiled with 'gcc -O3 -funroll-loops -fomit-frame-pointer -o olithink', while the 64-bit binary was compiled with 'gcc -O3 -funroll-loops -fomit-frame-pointer -fast -mcpu=G5 -mpowerpc64 -o olithink64'.

Tord

Almost all my programs become slower with these compile flags. For me typically -O (or -O2) -fomit-frame-pointer is best.

My experience is similar, but not quite the same. I've found that big and complicated programs tend to run faster with -O or -O2, while very small programs run faster with -O3. For instance, -O and -O2 work best for Gothmog (about 12,000 lines of code), while -O3 works best for Glaurung (about 3,000 lines). I haven't seen any big difference between x86 and PowerPC in this respect. The same compiler settings produce the fastest executables on both platforms.

OliThink, as I exected, shows behaviour similar to Glaurung. -O3 gives the fastest executable (4182 knps with -O3, 3812 knps with -O2), at least on the G5. I haven't tested other platforms yet.

Tord

by **Josu? Forte** » 01 Nov 2004, 04:48

Hi OliverBr,

Thanks a lot to make your souce code publicly available!
Your code is a great source of information for those who want to learn bitboarb programming.

Congratulations!
Hope you continue making progress with your projects.

Best,
Josu

by **Sune Fischer** » 01 Nov 2004, 08:25

Hi Reinhard,

Reinhard Scharnagl wrote:Hi Oliver and Michael,
These are 30.15 million legal and fully informed moves generated a second for the traditional starting position (though supporting FRC and 10x8 boards).
Regards, Reinhard

OK you got me beat, I get only 22 M legal moves per second here on the XP2400+, although your P4 2.8 is probably a bit faster it isn't that much faster.

Another interesting test though is how many fully informed qsearch moves can it generate per second?

-S.

by **Reinhard Scharnagl** » 01 Nov 2004, 08:43

Hi Sune,

... I get only 22 M legal moves per second here

22M valid moves at your XP2400+ nevertheless absolutely is a top result! Which information do your generated moves include?

Another interesting test though is how many fully informed qsearch moves can it generate per second?

Here you have got me. I am just working on that. But I am not sure, whether the results finally would be that comparable. I am targeting to create a deescalating search, but it seems to be too complicated this moment. So I am now switching to a qsearch first and will follow my own ideas in a later version. How intensivly you are using a transposition table during qsearch? Which are your testing positions, what their results?

Reinhard.

P.S.: does your qsearch care on passed pawns to be evaluated differently?

by **Reinhard Scharnagl** » 01 Nov 2004, 20:00

Hi again Sune,

I am still waiting for a description of a testing scenario. What do you suggest?

Reinhard.

by **Sune Fischer** » 01 Nov 2004, 21:39

Hi Reinhard,

I do store some information with the move, the type of piece moving and the type being captured. That's not really needed for perft but it's nice to have when ordering the moves, perhaps it's luxury and not terribly efficient, but sometimes you have to design over the 0.05% speedup.

I don't currently have a valid qperfter either, but I did test it once and I seem to remember getting about 10 M generated moves/sec.

I don't use a hash for qsearch, it saves almost no nodes for me and the slowdown is significant (more than 10%). I use a small evaluation hash instead which doesn't save any nodes either but it does increase speed marginally.
I expect that to work better and better the heavier the eval becomes though.
Then again it might just be blowing the cache, speed alone is a good measure to see if it's working.

P.S.: does your qsearch care on passed pawns to be evaluated differently?

Of course!

This is done in the evaluation and hence in qsearch.

-S.

by **OliverBr** » 05 Nov 2004, 22:34

Hello,
thank you all again for you interest.
Now I have finished the Version pre Alpha 3, playing
already complete chess, including castling and check stuff.

For programmers the part, where the castling is handled
could be quite interesting.

Download at my homepage
http://home.arcor.de/dreamlike

-Oliver

by **Reinhard Scharnagl** » 05 Nov 2004, 22:55

Hi Sune,

in another thread I report on my qsearch, which is now including positional detail and passed pawn evaluation, both cached. There is not much benefit from caching in my test scenario. But I hope it would have more effect in a real game. But to know that, I have to finish my Smirf first. I have to work, wait and see ...

It would be interesting to know on the rates of other chess programs, how often they perform a complex detail evaluation a second.

Regards, Reinhard.

by **Sune Fischer** » 05 Nov 2004, 23:58

Hi Reinhard,

Pawn hashing is a super trick to speed up complex evaluation.

With pawn hashing I get a frenzeemark of 981, without hashing I get only 504!

The pawn eval is also the most sophisticated eval I use.
It evaluates trapped bishops, fortresses, different types of passers, blocked, hindered, double and many other kinds of pawn features.

I couldn't afford all of that without pawn hashing.
I think you will need it too, just wait and see till you've added lots of eval.

-S.

by **Uri Blass** » 06 Nov 2004, 01:05

Sune Fischer wrote:Hi Reinhard,

Pawn hashing is a super trick to speed up complex evaluation.

With pawn hashing I get a frenzeemark of 981, without hashing I get only 504!

The pawn eval is also the most sophisticated eval I use.
It evaluates trapped bishops, fortresses, different types of passers, blocked, hindered, double and many other kinds of pawn features.

I couldn't afford all of that without pawn hashing.
I think you will need it too, just wait and see till you've added lots of eval.

-S.

I do not understand how the pawn eval can evaluate trapped bishops.
I thought that it is a bishop eval in that case.

I also cannot see how pawn hash can help in trapped bishops because based on my understanding pawn hash only help to get information that is dependent only on pawns.

Same for fortresses and you cannot evaluate if position is fortress based only on the pawn structure.
You may use pawn hash only in case that part of your evaluation is a simple evaluation of pawns that is dependent only on the pawn structure without caring about other pieces or you want to store information about pawns.

Maybe you can store conditions to fortress that are usually false but I see no way that you can store fortress position.

personally I think that pawn hash is not something important to care about at this moment because movei is slow because of other factors and pawn hash is not going to do it signficantly faster and I am even not sure if it is going to do it faster.

It may be better to improve my mobility evaluation so I do not need to generate all moves to calculate mobility(it is one of the main factors that makes movei slow).

Uri

by **Reinhard Scharnagl** » 06 Nov 2004, 01:05

Hi Sune,

sure, there will be a lot of possibilities to enhance Smirf's performance, e.g. by caching calculated data - whatever it might be. What concerns pawn hashing I am not sure today, but I do not exclude anything for the future. But for now I have to do all the things, which have been prepared in my head. This are things like a kind of lazy positional detail evaluation, which I have implemented today.

Your frenzeemark (?) results are not telling me much. Instead numbers were interesting like how much detail evaluations will be used during a second.

Reinhard.

by **Sune Fischer** » 06 Nov 2004, 02:01

Hi Reinhard and Uri,

Reinhard, the number just indicates the speed increase it gets by using a pawn hash.

There are otherways to speed things up, i.e. lazy eval and such, but these are imperfect improvements because you get sligtly less accurate eval and probably different node counts.

However, there is also a good speed up from lazy evaluation, I think 20%-30% faster due to the lazy eval stuff.

Uri, a pawn hash can be used for anything involving pawns, or pawns+kings or whatever you xor into the key.
Trapped bishops involves also bishops but most of the computation is about analyzing the pawn structure so this can be hashed and reused cheaply.

If I were you I would test and see if a hash gives a speedup, rather than assume it doesn't.

Once you start playing with this stuff you realize that you can do almost anything for free as long as it gets pawn hashed!
Quite a gift really if you think about it, worth getting the creative juices flowing over this one, IMO.

-S.

Winboard Forum

OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

New Pre Alpha Realease

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Re: OliThink 5

Who is online