unefficient parts of glaurung

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

unefficient parts of glaurung

Postby Uri Blass » 29 Aug 2006, 10:02

I wonder how much speed improvement it is possible to get by improving speed of glaurung without doing big changes in the structure and only with making the code simpler.

I believe that there are unefficient parts in that program.
I give only one example and there are more unefficient parts that I do not mention in this post.


The pos structure include both the side to move and the side of the opponent to move(side and xside)

It seems to me waste of time to update xside in lines like
pos->side ^= 1; pos->xside ^= 1;

I find that pos->xside is not used often in glaurung
for example I find the following string 8 times in glaurung

side = pos->side, xside = side^1;

It means that glaurung even does not use pos->xside

I am not sure if it is better to have seperate variable for xside but
it is possible to say that without it the code may be bigger even if it is faster but in case of pos->xside if I do not miss something deleting it and replacing every pos->xside in the few cases that it needed by pos->side^1 is going to do glaurung both simpler and faster.

I have not a good compiler and I do not plan to compile glaurung to check it but it may be interesting to know how much speed improvement people can get from this simple changes
in the code:

1)replacing pos->xside by pos->side^1 in the following lines:

eval.cpp(362): e_result += PSQ(KingOfColour(pos->xside), xksq);
eval.cpp(845): int c = PieceOfColourAndType(pos->xside, CAPTURE(m));
extend.cpp(12): if(PIECE(m)==PAWN && PawnRank[pos->xside][TO(m)] == RANK_7)

search.cpp(110): if(PIECE(m) == PAWN && pawn_is_passed(pos, TO(m), pos->xside)) return false;
search.cpp(141): !global_see(pos, pos->xside, ss[ply].eval - beta - margin))

2)deleting xside from the pos field

3)deleting pos->xside in the other cases that it is used when glaurung change sides.


Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: unefficient parts of glaurung

Postby Dann Corbit » 06 Sep 2006, 04:16

Glaurung is almost totally bound by the eval function. Nothing else will matter much if it does not directly effect the time spent in eval().

Before optimizing any program, it is very important to see where the time goes.

If you make something that takes up 1% of the program's time a million times faster, it will have basically no effect on the program's speed.
Dann Corbit
 

Re: unefficient parts of glaurung

Postby Uri Blass » 06 Sep 2006, 10:27

Dann Corbit wrote:Glaurung is almost totally bound by the eval function. Nothing else will matter much if it does not directly effect the time spent in eval().

Before optimizing any program, it is very important to see where the time goes.

If you make something that takes up 1% of the program's time a million times faster, it will have basically no effect on the program's speed.


1)I disagree that nothing else will matter.

If you do something that takes up 1% of the time twice faster you can do the program 0.5% faster.

I can see easily unefficient parts

2)I also do not agree that glaurung is almost totally bound by it's evaluation.

It may take significant part of it's time but without profiling glaurung I know that glaurung is not an extremely slow program
and it can search 300 knodes per second in the opening position on machine that is not extremely fast.

I remember that Tord said that he could not get a very fast program even in case of having material only evaluation so my guess is that the evaluation takes more than 70% of the time.

3)I doubt if the time by the profiler is not misleading
and I suspect that making one part of the program faster and simpler may help to make other parts faster.

I am not an expert on hardware so I may be wrong but
I believe that when you have less variables the compiler can put part of the other variables into cheaper memory so using them is less expensive.

It means that making one part smaller and faster can also help
to make another part faster.

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: unefficient parts of glaurung

Postby Dann Corbit » 07 Sep 2006, 19:02

Uri Blass wrote:
Dann Corbit wrote:Glaurung is almost totally bound by the eval function. Nothing else will matter much if it does not directly effect the time spent in eval().

Before optimizing any program, it is very important to see where the time goes.

If you make something that takes up 1% of the program's time a million times faster, it will have basically no effect on the program's speed.


1)I disagree that nothing else will matter.

If you do something that takes up 1% of the time twice faster you can do the program 0.5% faster.

I can see easily unefficient parts
[/quote="Uri Blass"]
It does not work like that. It is more likely to make it 0.1% faster. There will be some spots in the program that are acting as a gate to slow things down the most. The Intel vtune profiler has a nice feature to show what these are, if you link with fixed:no and do call graph analysis.

When you find the biggest bottleneck for the program, that is where you should attack with your effort, because your greatest gains are there.

Uri Blass wrote:2)I also do not agree that glaurung is almost totally bound by it's evaluation.

It may take significant part of it's time but without profiling glaurung I know that glaurung is not an extremely slow program
and it can search 300 knodes per second in the opening position on machine that is not extremely fast.

I remember that Tord said that he could not get a very fast program even in case of having material only evaluation so my guess is that the evaluation takes more than 70% of the time.
[/quote="Uri Blass"]
The breakdown of the most expensive functions is:
evaluate() 74
evaluate_king_safety() 22
analyze_pawn_structure() 17

Nothing else is above 12

Uri Blass wrote:3)I doubt if the time by the profiler is not misleading
and I suspect that making one part of the program faster and simpler may help to make other parts faster.

I am not an expert on hardware so I may be wrong but
I believe that when you have less variables the compiler can put part of the other variables into cheaper memory so using them is less expensive.

It means that making one part smaller and faster can also help
to make another part faster.

Uri


Hardware wise, the most important thing is to keep things in the L1 cache. Dominatingly more important than this is the O(f(n)) performance of the algorithm. Hardware issues are almost irrelevant by comparison.
Dann Corbit
 

Re: unefficient parts of glaurung

Postby Uri Blass » 07 Sep 2006, 21:05

Dann Corbit wrote:
Uri Blass wrote:
Dann Corbit wrote:Glaurung is almost totally bound by the eval function. Nothing else will matter much if it does not directly effect the time spent in eval().

Before optimizing any program, it is very important to see where the time goes.

If you make something that takes up 1% of the program's time a million times faster, it will have basically no effect on the program's speed.


1)I disagree that nothing else will matter.

If you do something that takes up 1% of the time twice faster you can do the program 0.5% faster.

I can see easily unefficient parts
[/quote="Uri Blass"]
It does not work like that. It is more likely to make it 0.1% faster. There will be some spots in the program that are acting as a gate to slow things down the most. The Intel vtune profiler has a nice feature to show what these are, if you link with fixed:no and do call graph analysis.

When you find the biggest bottleneck for the program, that is where you should attack with your effort, because your greatest gains are there.

Uri Blass wrote:2)I also do not agree that glaurung is almost totally bound by it's evaluation.

It may take significant part of it's time but without profiling glaurung I know that glaurung is not an extremely slow program
and it can search 300 knodes per second in the opening position on machine that is not extremely fast.

I remember that Tord said that he could not get a very fast program even in case of having material only evaluation so my guess is that the evaluation takes more than 70% of the time.
[/quote="Uri Blass"]
The breakdown of the most expensive functions is:
evaluate() 74
evaluate_king_safety() 22
analyze_pawn_structure() 17

Nothing else is above 12

Uri Blass wrote:3)I doubt if the time by the profiler is not misleading
and I suspect that making one part of the program faster and simpler may help to make other parts faster.

I am not an expert on hardware so I may be wrong but
I believe that when you have less variables the compiler can put part of the other variables into cheaper memory so using them is less expensive.

It means that making one part smaller and faster can also help
to make another part faster.

Uri


Hardware wise, the most important thing is to keep things in the L1 cache. Dominatingly more important than this is the O(f(n)) performance of the algorithm. Hardware issues are almost irrelevant by comparison.


I thought to try to do glaurung faster and shorter and to see how much speed improvement I get but unfortunately I cannot even compile it to give deterministic results in the opening position(see my second thread)

I mentioned in another thread that the changes that I did in glaurung only in order to compile it with visual C6 and unforutnately I do not get deterministic results in the opening position.
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv


Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 20 guests