Winboard Forum

Posted: **27 Aug 2005, 12:27**

Hi all - long time since my last post ...

I have recently written a second chess engine: Xpdnt [about 130Knps]

Assuming I am not doing anything exceedingly stupid [programming wise or data-structure choices] does this seem exceedingly slow for 700 Mhz Pentium3? gcc 3.4.3

PVS AlphaBeta with Iterative deepening and Hashing
Check, OneReply, MateThreat, and PawnTo7th extensions
Quiesent search
NullMove R=2
IID
Killers

SE = material, KS(pawn shield), pawn structure, a few patterns[development - nothing too fancy]

no SEE, no Piece lists, no Pawn Hash [all on todo list]

My branching factor is about 2-3 [iff I am measuring it right - AVG( Iteration N time / Iteration N-1 time)]

I just cant figure out how/why my engine is 1/3 to 1/2 the speed of others

My question is: Does this seem suspiciously slow, or is it just me?

Should I be looking for serious search bugs?

Thanks, and look forward to all your comments.

Posted: **27 Aug 2005, 15:00**

It's very hard to compare NPS number sensibly since they depend on so many things.

You can check the speed of your movegen/makemove & core datastructures by comparing your "preft" numbers against other programs.

If they are a manitude slower than most other programs, you have a problem.

Posted: **27 Aug 2005, 17:19**

Thanks,

Good idea.

Turns out, I broke perft some place - am trying to fix it ....

Perft 6 was taking like 5 minutes! but then returned ~4x the correct node count

Posted: **27 Aug 2005, 17:57**

That was wierd.

My source for perft was mysteriuosly re-arranged ???

anyway,

perft 6
119060324 81070

81 secs for p3 700 [with no hashing or anything] seems ok

Any other ideas?

Posted: **27 Aug 2005, 18:14**

Maybe you should have a look at it with a profiler to see in which parts your program spends most of its time. The one from AMD (Codeanalyst) which is free available is pretty good IMO.

Roman

Posted: **27 Aug 2005, 20:49**

Thanks Roman.

I have been profiling and the standard hot spots are:

eval() and attack()

Its true, that if I swap out my eval() for material + PST version, the NPS is 2.5x faster

I tried doing more pointer stuff [rather than indexing arrays] and it actually got worse.

Do the compilers these days take care of most of that sort of thing?

I will have a look at codeanalyst - thanks again

Posted: **27 Aug 2005, 20:59**

Hello David,

nice too see you back

btw: is GES a finished project now as you have started a new engine?

Peter

Posted: **27 Aug 2005, 22:18**

There's no real advantage to using pointers. In fact it will tend to introduce aliasing problems for the compiler.

Posted: **27 Aug 2005, 23:10**

Are you sure that there is no real advantage in using pointers(or maybe you mean only relative to indexing arrays)?

I am not sure about nothing when we talk about speed but
I will give an example from Movei's code when pointers seems to do the code faster but the speed improvement was small so I am not sure if pointers were the reason because I can get 1% speed improvement by random changes.

In movei's evaluation I had something in the following structure.
Note that phasewhite and phaseblack are not changed during the executation of the functions.

int calckingsafe2(int phasewhite,int phaseblack)
{
...
}
int evalrest(int phasewhite,int phaseblack)
{
....
score+=calckingsafe2(phasewhite,phaseblack);
...
}

I changed to the following structure because I think that without pointers the compiler needs to create copies of the phasewhite and phaseblack and it is a bad idea (unless the compiler is smart enough to understand that it does not need to generate copies of phasewhite and phaseblack
even if I do not use pointers).

int calckingsafe2(int* phasewhite,int* phaseblack)
{
...
}
int evalrest(int* phasewhite,int* phaseblack)
{
...
score+=calckingsafe2(phasewhite,phaseblack);
}

Posted: **28 Aug 2005, 02:03**

Thanks Peter,

I had 'realized' a couple of things when writing Xpdnt which would probably help GES a lot, and have intended to make maybe one more release, but havent got 'a-round-tuit' yet. Maybe some day i'll get one:)

Gian-Carlos & Uri, I remember [hearing at least] in days past, that pointer arithmetic and so on was so much better/faster than array indexing, but suspected that [after trying it] that the compiler was smarter than me!

Uri, I suspect your 1% estimate is even higher. that 'practical no-op' thing that Tord talked about was pretty wierd, but somehow it doesnt suprise me. Yet another reason that testing is so hard: you never can tell from preliminary tests that something isnt just some sort of fluke

Wouldnt it just be the greatest, if we had some sort of perfect oracle to test againts! I have been using Fruit 2.1

Posted: **28 Aug 2005, 08:13**

I dont think I understand what Uri is takling about here ... usually address size == sizeof int ... so amount of 'work' you do for both is same in both cases.

The usual idea behind passing a ptr are (from the top of my head) :

1) Passing a block of mem - the obvious use

2) to avoid a huge block push/pop - usually relevent for struct.
3) Simulating multi valued return.

Indexing array as pointer is faster (slightly) for other reasons - more so in our (chess) case since we can use shifts to calculate the offset faster (a good compiler will do the same anyway).
If you are using pointer increment/decrement style stuff - then you could become faster (leverage instruction pipelining , etc).

Are you sure of the 1% gain/loss uri ? Could be noise.
In your particular case , it should be theoretically worse to use pointer (you have to dereference it to get the value).
Maybe I am missing something here ..

Regards,
Mridul

Posted: **28 Aug 2005, 09:00**

mridul wrote:I dont think I understand what Uri is takling about here ... usually address size == sizeof int ... so amount of 'work' you do for both is same in both cases.

The usual idea behind passing a ptr are (from the top of my head) :

1) Passing a block of mem - the obvious use
2) to avoid a huge block push/pop - usually relevent for struct.
3) Simulating multi valued return.

Indexing array as pointer is faster (slightly) for other reasons - more so in our (chess) case since we can use shifts to calculate the offset faster (a good compiler will do the same anyway).
If you are using pointer increment/decrement style stuff - then you could become faster (leverage instruction pipelining , etc).

Are you sure of the 1% gain/loss uri ? Could be noise.
In your particular case , it should be theoretically worse to use pointer (you have to dereference it to get the value).
Maybe I am missing something here ..

Regards,
Mridul

Thanks for your comments.

I checked again and it seems that you are right and code without pointers here is slightly faster(51.67 seconds is faster than 51.79 seconds).

I think that the 1% speed improvement was result of another change that I made at the same time.

I still do not understand the reason that pointers are supposed to be slower but I guess that you are right.

If I understand you correctly
you claim that in theory it is worse because
score=*x+ is more expensive than score=x+ but I do not understand why it should be the case.

The reason that I thought that pointers should be faster is the following reason:
Suppose function A calls to function B and give it a variable.

The computer needs to remember 2 copies of the same variable(one inside function A and one inside function B because changing the variable inside function B does not change the value of the variable inside function A.

Suppose function A calls function B and give it a pointer.
The computer needs to remember only one copy of the variable so it seems faster.

Uri

Posted: **28 Aug 2005, 11:46**

The problems of pointers is "aliasing": for example, you are inside a function that gets an int* pointer passed. If you change the value the pointer points to, the compiler must reload all int values you use from memory again, since it does not know what the pointer points to and hence any int value could be changed.

It's very hard for compilers to deal with.

Without pointers, figuring out that one function doesn't change a value and that hence no extra copy is needed is trivial.

Posted: **28 Aug 2005, 19:37**

The problems of pointers is "aliasing": for example, you are inside a function that gets an int* pointer passed. If you change the value the pointer points to, the compiler must reload all int values you use from memory again, since it does not know what the pointer points to and hence any int value could be changed.

It's very hard for compilers to deal with.

Gian,

This is an advanced topic unknown to me.

Say I have global variables:-

Code: Select all: int myInt1, myInt2,.... etc. void anycall(int *ptr){ *ptr = 1; }

The probem is that what ptr points to can only be kwown at runtime
and so at compile time such situations require special treatment - like
extra code, extra copy of "all int in the program..." and this may cause great runtime inefficiency.

Is there a simple /rediculous example to illustrate a worst case scenario.

Thanks
Rasjid

Posted: **28 Aug 2005, 20:15**

Chan Rasjid wrote:
The problems of pointers is "aliasing": for example, you are inside a function that gets an int* pointer passed. If you change the value the pointer points to, the compiler must reload all int values you use from memory again, since it does not know what the pointer points to and hence any int value could be changed.

It's very hard for compilers to deal with.

Gian,

This is an advanced topic unknown to me.

Say I have global variables:-
Code: Select all
int myInt1, myInt2,.... etc. void anycall(int *ptr){ *ptr = 1; }

The probem is that what ptr points to can only be kwown at runtime
and so at compile time such situations require special treatment - like
extra code, extra copy of "all int in the program..." and this may cause great runtime inefficiency.

Is there a simple /rediculous example to illustrate a worst case scenario.

Thanks
Rasjid

Hi Rasjid,
this one is horrible with msc2005 beta 64-bit.
The this pointer is passed to the virtual member "deBruijnFound".
Even declared as const, the compiler considers the routine may change this object and saves a lot of registers - and therefor is not able to keep members in registers. Inspect the horrorfull assembly.

The class implements a De Bruijn generator.
You may overload deBruijnFound in a derived class, to look for special properties of De Bruijn constants for your purpose. One interesting application is to hash bitboards with two bits set.

Gerd

Code: Select all: BitBoard pow2[64] = { 0x0000000000000001,0x0000000000000002,0x0000000000000004,0x0000000000000008, 0x0000000000000010,0x0000000000000020,0x0000000000000040,0x0000000000000080, 0x0000000000000100,0x0000000000000200,0x0000000000000400,0x0000000000000800, 0x0000000000001000,0x0000000000002000,0x0000000000004000,0x0000000000008000, 0x0000000000010000,0x0000000000020000,0x0000000000040000,0x0000000000080000, 0x0000000000100000,0x0000000000200000,0x0000000000400000,0x0000000000800000, 0x0000000001000000,0x0000000002000000,0x0000000004000000,0x0000000008000000, 0x0000000010000000,0x0000000020000000,0x0000000040000000,0x0000000080000000, 0x0000000100000000,0x0000000200000000,0x0000000400000000,0x0000000800000000, 0x0000001000000000,0x0000002000000000,0x0000004000000000,0x0000008000000000, 0x0000010000000000,0x0000020000000000,0x0000040000000000,0x0000080000000000, 0x0000100000000000,0x0000200000000000,0x0000400000000000,0x0000800000000000, 0x0001000000000000,0x0002000000000000,0x0004000000000000,0x0008000000000000, 0x0010000000000000,0x0020000000000000,0x0040000000000000,0x0080000000000000, 0x0100000000000000,0x0200000000000000,0x0400000000000000,0x0800000000000000, 0x1000000000000000,0x2000000000000000,0x4000000000000000,0x8000000000000000 }; /* usage: DeBruijnGenerator db; db.genDeBruijn(6); // 1..6 2**i-i sequences */ class DeBruijnGenerator { protected: BitBoard lck, seq; unsigned int depth, mask; int count; void searchDeBruijn(unsigned int i) { // i = (int)(seq >> depth) & mask; if ( depth > 1 ) { unsigned int j = (i+i)&mask; // next even index lck ^= pow2[i]; // lock current index depth--; // sequence index right if ( (lck & pow2[j]) == 0 ) { // next even index unlocked? seq &= ~pow2[depth]; // "append" zero searchDeBruijn(j); // do recursive search with next even index } if ( (lck & pow2[j+1]) == 0 ) { // next odd index unlocked? seq |= pow2[depth]; // "append" one searchDeBruijn(j+1); // do recursive search with next odd index } depth++; // sequence index left lck ^= pow2[i]; // unlock current index } else if ( (lck & pow2[(i+i+1)&mask]) == 0 ) { deBruijnFound(seq); // bit zero was already set during initialization count++; } } virtual void deBruijnFound(BitBoard deBruijn) const {}; public: DeBruijnGenerator() {count = 0;} int getNoofFound() const {return count;} void genDeBruijn(unsigned int expOfTwo) { if ( expOfTwo > 6 ) { printf ("%d exceeds max exponent 6 for 64/6 sequence\n", expOfTwo); expOfTwo = 6; } unsigned int powOfTwo = 1 << expOfTwo; seq = 1; // the initial sequence lck = pow2[powOfTwo/2]; // lock last index for performance depth = powOfTwo-expOfTwo; // initial depth for expOfTwo leading zeros mask = powOfTwo-1; // mask searchDeBruijn(0); // start with index 0 due to expOfTwo leading zeros } };

Posted: **29 Aug 2005, 05:42**

Gerd,

I barely know assembly so I don't expect to know the problem as others do.

I have some idea that in your example, it is a sort of bug or inefficiency as you mentioned mscv2005beta64, etc. different compilers and option flags produce better or worst assembly codes.

From a layman's way of understanding, it is like the hardware processor people who make things too good to be true and therefore the designer of compilers have to invent better and more generic technique in compiler design.

By the way, is there some sort of simple rule of thumb that can alert
non expert to such problems.

I can mention one case with no problem - passing a pointer to a unique (even huge) array of large structure.

Thanks
Rasjid

Posted: **29 Aug 2005, 09:18**

Chan Rasjid wrote:
The problems of pointers is "aliasing": for example, you are inside a function that gets an int* pointer passed. If you change the value the pointer points to, the compiler must reload all int values you use from memory again, since it does not know what the pointer points to and hence any int value could be changed.

It's very hard for compilers to deal with.

Gian,

This is an advanced topic unknown to me.

Say I have global variables:-
Code: Select all
int myInt1, myInt2,.... etc. void anycall(int *ptr){ *ptr = 1; }

The probem is that what ptr points to can only be kwown at runtime
and so at compile time such situations require special treatment - like
extra code, extra copy of "all int in the program..." and this may cause great runtime inefficiency.

Is there a simple /rediculous example to illustrate a worst case scenario.

Thanks
Rasjid

In your example, the calling code may know what the ptr passed to anycall points to, so it's not so evil that everything is affected. But for example.

Code: Select all: int myInt1, myInt2,.... etc. void anycall(int *ptr){ int a, b; a = myInt1; *ptr = 1; b = myInt1; }

The b = myInt1 statement requires a memory load, and cannot be done from a register.

Posted: **01 Sep 2005, 14:06**

I forgot to mention I am using int _board[256] with int * board = _board+64

and apply 0x88 stuff with 'border' squares

Any way, I am wondering maybe I should withdraw my original assumptions

But even if I were not using the most efficient approaches to things, it still didnt seem I should suffer 100-200% slow-down.

Am I wrong?

If I could have managed to spend 3x longer than others in my prob-hash() routine, eg., that is still relatively insignificant [ie., not near 100-200%]

My InCheck() routine is similar in kind to TSCP's.

Now, InCheck() is in the top 2 routines when profiling [ IIRC 30-40%]

But even this, If I were doing it less than perfectly, couldnt account for 100-200% slow down in over-all NPS

I have tried rewriting these, but it hardly seems to matter.

InCheck() Gen() Eval()

Just looking for some help as to where I might look next. I ran out of ideas.....

I dont even collect attack information!

Thanks

Posted: **01 Sep 2005, 14:20**

My InCheck() routine is similar in kind to TSCP's.

Now, InCheck() is in the top 2 routines when profiling [ IIRC 30-40%]

There's your problem. TSCP's InCheck() is highly inefficient. IIRC it will loop over all pieces to find out whether it attacks the king square.
An easy improvement is to start with the square of the king and find out if there is an enemy piece that attacks that square. So that would mean generating the rays from the king's square, not from the pieces squares.
Richard.

Posted: **01 Sep 2005, 17:21**

Hi Richard,

Thank you.

But I must be mistaken again. I thought for some reason, that that was how he did it.

I _DO_ trace out from square seeking for pieces which attack it

Darn! When you said, 'there is your problem', I got so excited

Thanks for trying though.

Winboard Forum

NPS

NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS

Re: NPS