OliThink 5

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Re: OliThink 5

Postby OliverBr » 30 Oct 2004, 13:44

Uri Blass wrote:Maybe the trick is to write long code because with less than hundreds of lines you cannot be very fast.
Uri


I do not see the connection between code length and speed.
Shouldn't it be vice versa? Less code, less instructions, the faster
the CPU goes through?! When I was programming assembler,
is was like this!

There is no doubt that larger code gets better tree cutting and better
evaluations, but especially in perft with non of these, a short and
compact code should be the fastest.

-Oliver
OliverBr
 

Re: OliThink 5

Postby Reinhard Scharnagl » 30 Oct 2004, 13:54

Hi Oliver and Michael,

to be comparable (without updating the complicated statistic) Smirf using only one 64 bit counter gets (P4, 2.8 GHz):
Code: Select all
  +-*--b--c--d--*--f--g--*-+ MS Vis. Studio C++ Vers. 13.10
8 |[r][n][b][q][k][b][n][r]| (Compilation: Oct 30 2004)
7 |[p][p][p][p][p][p][p][p]|
6 |   :::   :::   :::   :::| Perft Testseries
5 |:::   :::   :::   :::   |
4 |   :::   :::   :::   :::| (without caching)
3 |:::   :::   :::   :::   |
2 |<P><P><P><P><P><P><P><P>| Smirf Test No.: 00
1 |<R><N><B><Q><K><B><N><R>|
=>+-*--b--c--d--*--f--g--*-+ Break Time 50.0 Sec.

Ply     Nodes    all (x)   (e.p.)    all (+) all (++)    Promot.    Castl. Sec.
-------------------------------------------------------------------------------
1          20          0        0          0        0          0         0    0
2         400          0        0          0        0          0         0    0
3        8902          0        0          0        0          0         0    0
4      197281          0        0          0        0          0         0    0
5     4865609          0        0          0        0          0         0  0.2
6   119060324          0        0          0        0          0         0  4.2
7  3195901860          0        0          0        0          0         0  106
-------------------------------------------------------------------------------

These are 30.15 million legal and fully informed moves generated a second for the traditional starting position (though supporting FRC and 10x8 boards).
Regards, Reinhard
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: OliThink 5

Postby Uri Blass » 30 Oct 2004, 14:05

Hi Oliver,The connection is simple.

Having more functions instead of general code make the code faster
For example I discovered that having a special generator for white pawns and for black pawns make the code faster because I do not need to check for every pawn if it is white or black and the if condition costs time.

I found that having functions for many types of moves also helped me to do the generator faster.


Note also that I do not know nothing about assembler and I use only C or C++ and I suspaect that bitboard is not the best idea for getting a faster move generator.

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: OliThink 5

Postby Reinhard Scharnagl » 30 Oct 2004, 14:11

Hi Uri,

I do fully agree with your statement. And additionally telling a small secret: Smirf still is not completely split into black and white routines ...

Reinhard.
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: OliThink 5

Postby OliverBr » 30 Oct 2004, 18:30

Hello Tord,
thank you for your work.
Unfortunately getShiftMask is the worst function you can pick, because
it is only called few times for initialization and only contains
32bit integers.

void upDateBoards(Move move) or when this is too large, just
void doPromotion(int piece, int from) would be much more interesting.

Tord Romstad wrote:Here is the getShiftMask function, first in the 32-bit compile, then in the 64-bit compile:
OliverBr
 

Re: OliThink 5

Postby Anonymous » 30 Oct 2004, 19:05

Tord Romstad wrote:
The 32-bit binary was compiled with 'gcc -O3 -funroll-loops -fomit-frame-pointer -o olithink', while the 64-bit binary was compiled with 'gcc -O3 -funroll-loops -fomit-frame-pointer -fast -mcpu=G5 -mpowerpc64 -o olithink64'.

Tord


Almost all my programs become slower with these compile flags. For me typically -O (or -O2) -fomit-frame-pointer is best. One can even see it in the generated assembly, how often -O3 gives worse code. For example due to over aggressive inlining. The inlined function often has less registers (because one or a few are taken by the caller), and now some inner loop code can easily become less efficient. The calling overhead can be much less compared to this. Also gcc likes to internally produce pointer code (*a++ = *b++ * ... when one writes a[i] = b[i] + ...). Sometimes this needs more registers, often it needs more instructions (the pointer increments), compared to the indexed code (with more complicated adressing) , which is produced at lower optimization levels, and often is better. Of course it might be very different on PowerPC (More registers, I guess no indexed adressing modes, etc. )

BTW. I always wondered, why compilers don't default for sensible optimization settings. Compilation times can hardly be an argument anymore - at least not for medium sized projects. Intel cc is the only exception I know - it produces fast code without any switches.

Regards,
Dieter
Anonymous
 

Re: OliThink 5

Postby Pallav Nawani » 30 Oct 2004, 19:06

OliverBr wrote:
BTW, I am happy that the Persians when they invented chess,
did consider the problem of alignement with computer registers,
so they choose wisely a 8x8 board.

-Oliver


Persians did not Invent chess. It was invented in India.

http://www.geocities.com/pallavnawani/chess-history.html
http://en.wikipedia.org/wiki/Chess

Best regards,
Pallav
User avatar
Pallav Nawani
 
Posts: 147
Joined: 26 Sep 2004, 20:00
Location: Dehradun, India

Re: OliThink 5

Postby Tord Romstad » 31 Oct 2004, 21:24

OliverBr wrote:Hello Tord,
thank you for your work.
Unfortunately getShiftMask is the worst function you can pick, because
it is only called few times for initialization and only contains
32bit integers.

Of course you are right. I should have checked a bit more carefully before posting.
void upDateBoards(Move move) or when this is too large, just
void doPromotion(int piece, int from) would be much more interesting.

upDateBoards is indeed a bit too large, but doPromotion is included below (32 bit-version first, 64-bit version at the end). This time the 64-bit version is slightly shorter, and the 32-bit version contains pairs of 'xor' instructions at locations where the 64-bit version has only a single 'xor'. Unfortunately it seems that most of the processor time is spent doing the numerous instructions between the xors.

32-bit compile:
Code: Select all
_doPromotion:
   mflr r9
   bcl 20,31,"L00000000012$pb"
"L00000000012$pb":
   stmw r28,-16(r1)
   mflr r31
   addis r5,r31,ha16(L_PieceBoard$non_lazy_ptr-"L00000000012$pb")
   addis r7,r31,ha16(L_bitMask$non_lazy_ptr-"L00000000012$pb")
   lwz r2,lo16(L_PieceBoard$non_lazy_ptr-"L00000000012$pb")(r5)
   slwi r8,r3,3
   lwz r0,lo16(L_bitMask$non_lazy_ptr-"L00000000012$pb")(r7)
   slwi r6,r4,3
   add r5,r8,r2
   mtlr r9
   add r28,r6,r0
   mr r29,r2
   addi r30,r5,56
   addi r0,r2,168
L853:
   lwz r9,8(r29)
   lwz r10,12(r29)
   addi r8,r29,56
   lwz r2,0(r28)
   lwz r3,4(r28)
   lwz r11,512(r28)
   lwz r12,516(r28)
   addi r28,r28,1024
   xor r6,r9,r2
   xor r7,r10,r3
   stw r6,8(r29)
   stw r7,12(r29)
   addi r29,r29,112
   lwz r9,-56(r30)
   lwz r10,-52(r30)
   cmpw cr0,r29,r0
   xor r4,r9,r2
   xor r5,r10,r3
   stw r4,-56(r30)
   stw r5,-52(r30)
   lwz r2,8(r8)
   lwz r3,12(r8)
   xor r6,r2,r11
   xor r7,r3,r12
   stw r6,8(r8)
   stw r7,12(r8)
   lwz r2,0(r30)
   lwz r3,4(r30)
   xor r4,r2,r11
   xor r5,r3,r12
   stw r4,0(r30)
   stw r5,4(r30)
   addi r30,r30,112
   ble- cr0,L853
   lmw r28,-16(r1)
   blr

64-bit compile:
Code: Select all
_doPromotion:
   lis r2,ha16(L_PieceBoard$non_lazy_ptr)
   lis r5,ha16(L_bitMask$non_lazy_ptr)
   slwi r6,r4,3
   slwi r3,r3,3
   lwz r12,lo16(L_PieceBoard$non_lazy_ptr)(r2)
   lwz r10,lo16(L_bitMask$non_lazy_ptr)(r5)
   ld r8,8(r12)
   ldx r9,r6,r10
   add r4,r6,r10
   addi r5,r12,56
   addi r6,r3,56
   ld r10,512(r4)
   xor r7,r8,r9
   ld r8,1024(r4)
   std r7,8(r12)
   nop
   ld r7,1536(r4)
   ldx r2,r3,r12
   xor r0,r2,r9
   stdx r0,r3,r12
   ld r4,8(r5)
   xor r9,r4,r10
   addi r4,r12,112
   std r9,8(r5)
   addi r9,r3,112
   ldx r5,r6,r12
   xor r0,r5,r10
   stdx r0,r6,r12
   ld r10,8(r4)
   xor r2,r10,r8
   addi r10,r12,168
   std r2,8(r4)
   addi r2,r3,168
   ldx r4,r9,r12
   xor r6,r4,r8
   stdx r6,r9,r12
   ld r5,8(r10)
   xor r0,r5,r7
   std r0,8(r10)
   nop
   ldx r4,r2,r12
   xor r3,r4,r7
   stdx r3,r2,r12
   blr


Tord
User avatar
Tord Romstad
 
Posts: 639
Joined: 09 Oct 2004, 12:49
Location: Oslo, Norway

Re: OliThink 5

Postby Tord Romstad » 31 Oct 2004, 21:33

Dieter B?r?ner wrote:
Tord Romstad wrote:
The 32-bit binary was compiled with 'gcc -O3 -funroll-loops -fomit-frame-pointer -o olithink', while the 64-bit binary was compiled with 'gcc -O3 -funroll-loops -fomit-frame-pointer -fast -mcpu=G5 -mpowerpc64 -o olithink64'.

Tord


Almost all my programs become slower with these compile flags. For me typically -O (or -O2) -fomit-frame-pointer is best.


My experience is similar, but not quite the same. I've found that big and complicated programs tend to run faster with -O or -O2, while very small programs run faster with -O3. For instance, -O and -O2 work best for Gothmog (about 12,000 lines of code), while -O3 works best for Glaurung (about 3,000 lines). I haven't seen any big difference between x86 and PowerPC in this respect. The same compiler settings produce the fastest executables on both platforms.

OliThink, as I exected, shows behaviour similar to Glaurung. -O3 gives the fastest executable (4182 knps with -O3, 3812 knps with -O2), at least on the G5. I haven't tested other platforms yet.

Tord
User avatar
Tord Romstad
 
Posts: 639
Joined: 09 Oct 2004, 12:49
Location: Oslo, Norway

Re: OliThink 5

Postby Josu? Forte » 01 Nov 2004, 04:48

Hi OliverBr,

Thanks a lot to make your souce code publicly available!
Your code is a great source of information for those who want to learn bitboarb programming.

Congratulations!
Hope you continue making progress with your projects.

Best,
Josu
User avatar
Josu? Forte
 
Posts: 25
Joined: 02 Oct 2004, 23:58
Location: Rio de Janeiro / Brasil

Re: OliThink 5

Postby Sune Fischer » 01 Nov 2004, 08:25

Hi Reinhard,

Reinhard Scharnagl wrote:Hi Oliver and Michael,
These are 30.15 million legal and fully informed moves generated a second for the traditional starting position (though supporting FRC and 10x8 boards).
Regards, Reinhard


OK you got me beat, I get only 22 M legal moves per second here on the XP2400+, although your P4 2.8 is probably a bit faster it isn't that much faster.

Another interesting test though is how many fully informed qsearch moves can it generate per second? ;)

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: OliThink 5

Postby Reinhard Scharnagl » 01 Nov 2004, 08:43

Hi Sune,

... I get only 22 M legal moves per second here

22M valid moves at your XP2400+ nevertheless absolutely is a top result! Which information do your generated moves include?
Another interesting test though is how many fully informed qsearch moves can it generate per second?

Here you have got me. I am just working on that. But I am not sure, whether the results finally would be that comparable. I am targeting to create a deescalating search, but it seems to be too complicated this moment. So I am now switching to a qsearch first and will follow my own ideas in a later version. How intensivly you are using a transposition table during qsearch? Which are your testing positions, what their results?

Reinhard.

P.S.: does your qsearch care on passed pawns to be evaluated differently?
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: OliThink 5

Postby Reinhard Scharnagl » 01 Nov 2004, 20:00

Hi again Sune,

I am still waiting for a description of a testing scenario. What do you suggest?

Reinhard.
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: OliThink 5

Postby Sune Fischer » 01 Nov 2004, 21:39

Hi Reinhard,

I do store some information with the move, the type of piece moving and the type being captured. That's not really needed for perft but it's nice to have when ordering the moves, perhaps it's luxury and not terribly efficient, but sometimes you have to design over the 0.05% speedup.

I don't currently have a valid qperfter either, but I did test it once and I seem to remember getting about 10 M generated moves/sec.

I don't use a hash for qsearch, it saves almost no nodes for me and the slowdown is significant (more than 10%). I use a small evaluation hash instead which doesn't save any nodes either but it does increase speed marginally.
I expect that to work better and better the heavier the eval becomes though.
Then again it might just be blowing the cache, speed alone is a good measure to see if it's working.

P.S.: does your qsearch care on passed pawns to be evaluated differently?


Of course! :)
This is done in the evaluation and hence in qsearch.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

New Pre Alpha Realease

Postby OliverBr » 05 Nov 2004, 22:34

Hello,
thank you all again for you interest.
Now I have finished the Version pre Alpha 3, playing
already complete chess, including castling and check stuff.

For programmers the part, where the castling is handled
could be quite interesting.

Download at my homepage
http://home.arcor.de/dreamlike

-Oliver
OliverBr
 

Re: OliThink 5

Postby Reinhard Scharnagl » 05 Nov 2004, 22:55

Hi Sune,

in another thread I report on my qsearch, which is now including positional detail and passed pawn evaluation, both cached. There is not much benefit from caching in my test scenario. But I hope it would have more effect in a real game. But to know that, I have to finish my Smirf first. I have to work, wait and see ...

It would be interesting to know on the rates of other chess programs, how often they perform a complex detail evaluation a second.

Regards, Reinhard.
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: OliThink 5

Postby Sune Fischer » 05 Nov 2004, 23:58

Hi Reinhard,

Pawn hashing is a super trick to speed up complex evaluation.

With pawn hashing I get a frenzeemark of 981, without hashing I get only 504!

The pawn eval is also the most sophisticated eval I use.
It evaluates trapped bishops, fortresses, different types of passers, blocked, hindered, double and many other kinds of pawn features.

I couldn't afford all of that without pawn hashing.
I think you will need it too, just wait and see till you've added lots of eval. ;)

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: OliThink 5

Postby Uri Blass » 06 Nov 2004, 01:05

Sune Fischer wrote:Hi Reinhard,

Pawn hashing is a super trick to speed up complex evaluation.

With pawn hashing I get a frenzeemark of 981, without hashing I get only 504!

The pawn eval is also the most sophisticated eval I use.
It evaluates trapped bishops, fortresses, different types of passers, blocked, hindered, double and many other kinds of pawn features.

I couldn't afford all of that without pawn hashing.
I think you will need it too, just wait and see till you've added lots of eval. ;)

-S.


I do not understand how the pawn eval can evaluate trapped bishops.
I thought that it is a bishop eval in that case.

I also cannot see how pawn hash can help in trapped bishops because based on my understanding pawn hash only help to get information that is dependent only on pawns.

Same for fortresses and you cannot evaluate if position is fortress based only on the pawn structure.
You may use pawn hash only in case that part of your evaluation is a simple evaluation of pawns that is dependent only on the pawn structure without caring about other pieces or you want to store information about pawns.

Maybe you can store conditions to fortress that are usually false but I see no way that you can store fortress position.

personally I think that pawn hash is not something important to care about at this moment because movei is slow because of other factors and pawn hash is not going to do it signficantly faster and I am even not sure if it is going to do it faster.

It may be better to improve my mobility evaluation so I do not need to generate all moves to calculate mobility(it is one of the main factors that makes movei slow).

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: OliThink 5

Postby Reinhard Scharnagl » 06 Nov 2004, 01:05

Hi Sune,

sure, there will be a lot of possibilities to enhance Smirf's performance, e.g. by caching calculated data - whatever it might be. What concerns pawn hashing I am not sure today, but I do not exclude anything for the future. But for now I have to do all the things, which have been prepared in my head. This are things like a kind of lazy positional detail evaluation, which I have implemented today.

Your frenzeemark (?) results are not telling me much. Instead numbers were interesting like how much detail evaluations will be used during a second.

Reinhard.
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: OliThink 5

Postby Sune Fischer » 06 Nov 2004, 02:01

Hi Reinhard and Uri,

Reinhard, the number just indicates the speed increase it gets by using a pawn hash.

There are otherways to speed things up, i.e. lazy eval and such, but these are imperfect improvements because you get sligtly less accurate eval and probably different node counts.

However, there is also a good speed up from lazy evaluation, I think 20%-30% faster due to the lazy eval stuff.

Uri, a pawn hash can be used for anything involving pawns, or pawns+kings or whatever you xor into the key.
Trapped bishops involves also bishops but most of the computation is about analyzing the pawn structure so this can be hashed and reused cheaply.

If I were you I would test and see if a hash gives a speedup, rather than assume it doesn't.

Once you start playing with this stuff you realize that you can do almost anything for free as long as it gets pawn hashed!
Quite a gift really if you think about it, worth getting the creative juices flowing over this one, IMO.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

PreviousNext

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 36 guests