Which is faster on most processors?

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Which is faster on most processors?

Postby Steve Maughan » 01 Jun 2005, 18:49

I'm in the middle of ripping out all of the side to move dependent code in Monarch i.e. I'm writing a gen_moves() routine which combines gen_white_moves() and gen_black_moves().

As a result I have numerous code such as:

Code: Select all
board->ep_square = from_square - 16 + (color << 5)


Where color is 1 for white and 0 for black. My question is - on most processors which is faster, (color << 5) or (color * 32). I doubt there is much in it and I'm not an assembler expert. Can anyone advise?

Thanks,

Steve
Steve Maughan
 
Posts: 48
Joined: 06 Oct 2004, 17:40
Location: Florida USA

Re: Which is faster on most processors?

Postby Anonymous » 01 Jun 2005, 19:23

Undependent of the CPU any slightly decent compiler should produce the same code. I'd use *32 in this case, because it is somewhat better self documenting IMHO.

For example gcc, even without optimization (comments added by me):

Code: Select all
Dieter@DIETERCOMP ~/srctst
$ cat sm.c ; show the source
unsigned foo(unsigned f, unsigned c)
{
  return f - 16 + (c << 5);
}

unsigned bar(unsigned f, unsigned c)
{
  return f - 16 + c*32;
}

Dieter@DIETERCOMP ~/srctst
$ gcc -S  sm.c ; no optimization, produce assembly

Dieter@DIETERCOMP ~/srctst
$ cat sm.s ; show produced assembly
        .file   "sm.c"
        .text
.globl _foo
        .def    _foo;   .scl    2;      .type   32;     .endef
_foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %eax; eax = c
        sall    $5, %eax             ; eax = c<<5
        addl    8(%ebp), %eax   ; eax = c<<5+f
        subl    $16, %eax          ; eax = c<<5+f-16
        popl    %ebp
        ret
.globl _bar
        .def    _bar;   .scl    2;      .type   32;     .endef
_bar:                   ; and exactly the same code, when using *32 in source
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %eax
        sall    $5, %eax
        addl    8(%ebp), %eax
        subl    $16, %eax
        popl    %ebp
        ret

Dieter@DIETERCOMP ~/srctst
$


Things are a bit different, when you need a/32 vs. a>>5. If a is not an unsigned integer type, the compiler cannot optimize the a/32 to a>>5 (they will have different results in general, not to mention that a>>5 is not defined in Standard C for negative a, but works on all common environments as expected).

In this case, you know, that c is either 0 or 1. The compiler does not know it. For this reason, in similar cases some better code can be reached by some lower level bit fiddling code, that the compiler will not be able to find out. However, often the contrary is the case - programmers are used for some reason to write really low level bit fiddling code, that in effect produces less efficient code, than clearer and easier statements would yield.

Regards,
Dieter
Anonymous
 

Re: Which is faster on most processors?

Postby Tord Romstad » 01 Jun 2005, 20:49

Hi Steve!
Steve Maughan wrote:
Code: Select all
board->ep_square = from_square - 16 + (color << 5)

Wouldn't
Code: Select all
board->ep_square = (from_square + to_square) / 2

be faster as well as more logical?

Just like you, I try to avoid color-dependent code whenever possible.

Tord
User avatar
Tord Romstad
 
Posts: 639
Joined: 09 Oct 2004, 12:49
Location: Oslo, Norway

Re: Which is faster on most processors?

Postby Steve Maughan » 01 Jun 2005, 21:07

Tord Romstad wrote:Wouldn't
Code: Select all
board->ep_square = (from_square + to_square) / 2

be faster as well as more logical?
Tord


Yes of course! Great tip

Tord Romstad wrote:Just like you, I try to avoid color-dependent code whenever possible.
Tord


Writing none color dependent code is new for me. But the main aim of this major revision of Monarch is to write bug free code. I'm trying to write a test procedure for *every* routine - as well as liberal "asserts" in the code. Previous versions were just too complex / buggy to scale - a sentiment that you've expressed many times.

Dieter - Many thanks also for your input.

Steve
Steve Maughan
 
Posts: 48
Joined: 06 Oct 2004, 17:40
Location: Florida USA

Re: Which is faster on most processors?

Postby Steve Maughan » 01 Jun 2005, 21:30

Andrew Fan wrote:I wouldn't bother too much about optimising the EP move gen code - it's not as frequent as other moves.


Sure - this was just an example off the top of my head (English expression!). I'm not trying to optimize the ep code but the same, or similar, type of manipulation crops up many times in the code.

Thanks,

Steve
Steve Maughan
 
Posts: 48
Joined: 06 Oct 2004, 17:40
Location: Florida USA

Re: Which is faster on most processors?

Postby Anonymous » 01 Jun 2005, 21:38

Tord Romstad wrote:
Code: Select all
board->ep_square = (from_square + to_square) / 2

be faster as well as more logical?


More logical, yes; faster perhaps, when you have to_square conveniently available. For the original code sample, one shift, one add, one sub. But in typical code (when for example from_square is already available in a register), on x86 hardware at least the add and the sub can be replaced by one lea instruction (compilers are smart enough to do this). So one shift + one lea then. Your smart suggestion: one add and one shift - really the same. But I agree - better when to_square is available easily, and can hardly be slower then.

When using your snippet - be sure that the calculation is done in unsigned, otherwise typically one branch will be added. But here, we are going to dig into very subtle things. Using >>1 instead of /2 might be preferable here (and uglier, IMHO).

Another observation: Steve's code obviously depends on the "row length" of the board. Tord's expression doesn't. Both assume "row first" layout, however.

I also agree with Andrew - all effects will hardly be measurable.

Regards,
Dieter
Anonymous
 

Re: Which is faster on most processors?

Postby Tord Romstad » 01 Jun 2005, 22:03

Steve Maughan wrote:
Tord Romstad wrote:Just like you, I try to avoid color-dependent code whenever possible.
Tord


Writing none color dependent code is new for me. But the main aim of this major revision of Monarch is to write bug free code.

In my experience, color independent code makes it easier to achieve this aim. Back in the days when I still wrote color dependent code, my usual technique was to first write the code for white, and then make a copy of the code where I manually replaced '+' with '-', '>' with '<', WHITE_ROOK with BLACK_ROOK and so on. Of course, I often forgot something.

Steve Maughan wrote:I'm trying to write a test procedure for *every* routine - as well as liberal "asserts" in the code. Previous versions were just too complex / buggy to scale - a sentiment that you've expressed many times.

Yes. Like most of us, I learned this the hard way.

Dieter B?r?ner wrote:Another observation: Steve's code obviously depends on the "row length" of the board. Tord's expression doesn't. Both assume "row first" layout, however.

As far as I can see, my expression would work equally well in a "column first" board layout (if the phrases "row first" and "column first" mean what I think). Of course it would break in "diagonal first" and similar exotic layouts.

Thanks for the low-level optimization advice, by the way. Although we all agree that the difference would be insignificant in the case we are discussing, your advice could prove to be helpful in other contexts.

Tord
User avatar
Tord Romstad
 
Posts: 639
Joined: 09 Oct 2004, 12:49
Location: Oslo, Norway

Re: Which is faster on most processors?

Postby Anonymous » 01 Jun 2005, 22:21

Tord Romstad wrote:
Dieter B?r?ner wrote:Another observation: Steve's code obviously depends on the "row length" of the board. Tord's expression doesn't. Both assume "row first" layout, however.

As far as I can see, my expression would work equally well in a "column first" board layout (if the phrases "row first" and "column first" mean what I think). Of course it would break in "diagonal first" and similar exotic layouts.


I was wrong, I stand corrected. I believe we are thinking about the same "row first" and "column first" definitions. Indeed, your expression will not depend on this. No idea, why I first thought, it would. Another reason, to prefer your expression (with mentioned precautions).

Regards,
Dieter
Anonymous
 

Re: Which is faster on most processors?

Postby Dann Corbit » 02 Jun 2005, 00:48

Steve Maughan wrote:
Andrew Fan wrote:I wouldn't bother too much about optimising the EP move gen code - it's not as frequent as other moves.


Sure - this was just an example off the top of my head (English expression!). I'm not trying to optimize the ep code but the same, or similar, type of manipulation crops up many times in the code.

Thanks,

Steve


My {absurdly strong} opinion is that you should NEVER code for speed and you should ALWAYS code for clarity.

If (at some future point after you have performed a very careful profile) you discover a bottleneck, then it is time to think about ways to rewrite it.

The very last thing you should think of is tweaky little hacks for a tiny improvement (although I do use them myself). The very first thing you should think of is a better algorithm.

IMO-YMMV
Dann Corbit
 

Re: Which is faster on most processors?

Postby Steve Maughan » 02 Jun 2005, 04:48

Dann Corbit wrote:My {absurdly strong} opinion is that you should NEVER code for speed and you should ALWAYS code for clarity.


I guess that's where I'm heading with Monarch. Although I haven't in the past been obsessed with optimization I'm only now making bug free a top priority (i.e. #1 priority). This means compact code which is not color specific and at least one test routine for each procedure. Let's see how it materializes!!

Regards,

Steve
Steve Maughan
 
Posts: 48
Joined: 06 Oct 2004, 17:40
Location: Florida USA

Re: Which is faster on most processors?

Postby Tord Romstad » 02 Jun 2005, 09:38

Dann Corbit wrote:My {absurdly strong} opinion is that you should NEVER code for speed and you should ALWAYS code for clarity.

If (at some future point after you have performed a very careful profile) you discover a bottleneck, then it is time to think about ways to rewrite it.

The very last thing you should think of is tweaky little hacks for a tiny improvement (although I do use them myself). The very first thing you should think of is a better algorithm.

IMO-YMMV

I agree 100%. I even think you can safely drop the "YMMV" here.

While we're discussing the topic of color independent code, here is my code for generating castling moves:

Code: Select all
  if(!SearchStack[Ply].check) {
    if(OO_POSSIBLE(Side)) {
      int e1=E1+Side*A8, f1=F1+Side*A8, g1=G1+Side*A8;
      if(Board[f1]==EMPTY && Board[g1]==EMPTY &&
         !is_attacked(f1, XSide) && !is_attacked(g1, XSide))
        (ms++)->move = (KING<<17)|(e1<<7)|g1;
    }
    if(OOO_POSSIBLE(Side)) {
      int e1=E1+Side*A8, d1=D1+Side*A8, c1=C1+Side*A8, b1=B1+Side*A8;
      if(Board[d1]==EMPTY && Board[c1]==EMPTY && Board[b1]==EMPTY &&
         !is_attacked(d1, XSide) && !is_attacked(c1, XSide))
        (ms++)->move = (KING<<17)|(e1<<7)|c1;
    }
  }

There are, of course, some obvious ways to make this a tiny bit faster. I don't care. The time my program spends generating castling moves is utterly insignificant. But are there any good ways to make the code less dependent on the board layout?

I have considered using something like this:
Code: Select all
#define FLIP(side,square) ((square)^((side)*0x70))

and then replace e1=E1+Side*A8 with FLIP(Side,E1) etc. The FLIP macro obviously depends on the board layout, but it is easy to change, and it only needs to be changed at one place in the source code. Such a macro could also be useful several other places when trying to avoid color dependent code (when detecting trapped bishops on a2/a7/h2/h7, for instance).

Are there any better or more obvious ways to do this?

Tord
User avatar
Tord Romstad
 
Posts: 639
Joined: 09 Oct 2004, 12:49
Location: Oslo, Norway

Re: Which is faster on most processors?

Postby Peter Fendrich » 02 Jun 2005, 14:49

Tord Romstad wrote:I have considered using something like this:
Code: Select all
#define FLIP(side,square) ((square)^((side)*0x70))

and then replace e1=E1+Side*A8 with FLIP(Side,E1) etc. The FLIP macro obviously depends on the board layout, but it is easy to change, and it only needs to be changed at one place in the source code. Such a macro could also be useful several other places when trying to avoid color dependent code (when detecting trapped bishops on a2/a7/h2/h7, for instance).

Are there any better or more obvious ways to do this?
Tord

By using something like Cog http://www.nedbatchelder.com/code/cog/ you can generate the code for both black and white but solve it once and for all. The source would then be color independent. I'm not sure if this is better but it's a more general approach to the the black-white programming problem.
Aren't you using Lisp? It has the same capability of generating code I think.
/Peter
User avatar
Peter Fendrich
 
Posts: 193
Joined: 26 Sep 2004, 20:28
Location: Sweden

Re: Which is faster on most processors?

Postby Tord Romstad » 08 Jun 2005, 22:02

Peter Fendrich wrote:
Tord Romstad wrote:I have considered using something like this:
Code: Select all
#define FLIP(side,square) ((square)^((side)*0x70))

and then replace e1=E1+Side*A8 with FLIP(Side,E1) etc. The FLIP macro obviously depends on the board layout, but it is easy to change, and it only needs to be changed at one place in the source code. Such a macro could also be useful several other places when trying to avoid color dependent code (when detecting trapped bishops on a2/a7/h2/h7, for instance).

Are there any better or more obvious ways to do this?
Tord

By using something like Cog http://www.nedbatchelder.com/code/cog/ you can generate the code for both black and white but solve it once and for all. The source would then be color independent. I'm not sure if this is better but it's a more general approach to the the black-white programming problem.
Aren't you using Lisp? It has the same capability of generating code I think.
/Peter

Yes, I am using Lisp, and you are right: When programming in Lisp, it is very common to write code which writes code (and to write code which writes code which writes code, and so on). Such techniques are less attractive in C, for a number of reasons. The most important are that the C syntax is so messy that automatic code generation becomes rather painful except in trivial cases, and that the preprocessor language is different from the language itself.

Cog does look interesting, and I hadn't seen it before. Thanks for bringing it to my attention. However, it is not something I would like to use in my chess program. Because my program is open source, I want everybody to be able to understand and compile it without using any tools except a C compiler. I prefer to avoid using non-standard extensions or preprocessors.

Tord
User avatar
Tord Romstad
 
Posts: 639
Joined: 09 Oct 2004, 12:49
Location: Oslo, Norway


Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 6 guests