Tord Romstad wrote:Reinhard Scharnagl wrote:Does it make sense to store node evaluations into the transposition table?
Actually my opinion switches to not to store. What do you think about that?
Hi Reinhard,
As far as I know, most people do not store the static evaluation in the transposition table. However, it is quite common to have a separate, smaller hash table for the static eval. I am not the right person to give advice about what makes sense -- I have never tried storing the static eval anywhere at all.
Tord
In Diep i do both.
Please keep into account that the eval hashtable in itself is having a hitrate of 10-20% which is pretty low, as a result of storing evaluation
also in transpositiontable.
Total hitrate (both combined) is about 50%.
In case of 10% hitrate there, that means that the EFFECTIVE cost of a lookup at a dual core opteron 2.2Ghz is about 2.2 cycles/ns * 2340 nanoseconds.
That's roughly 5000 cycles.
So in such a case such a seperated evaltable besides storing in normal hashtable makes less sense for majority of the chessprograms,
As it only makes sense if you have an evaluation that's far far slower than 5000 cycles.
For majority of efficient programmed programs that's simply not the case.
For diep, which slow search is getting around 85k nps at k7 2.1ghz,
or 107k nps at a 1.8Ghz opteron (single core measured) , obviously
on AVERAGE the number of cycles spent on 1 node is already 25000 cycles, just imagine a full eval if you also realize the 50% hitrate,
additional to the 4% cutoffs the transpositiontable roughly gives,
added to the facts that positions in check you don't eval and that
in the majority of innernodes no evaluation happens either.
What did amaze me a few months ago is that increasing the size of the evaluation table and pawntable considerable to in total 200MB speeded up my program about 2%. This at a single cpu k7 (from my dual k7 which has a 400 ns latency to ddr ram).
At a quad opteron dual core with in total 4 GB ram and 8 cores,
a transpositiontable of 2 GB, taking 200MB a processor extra is tad of a problem though.
Fact is you can for each new machine keep optimizing forever hashtables,
as with every type of latency and architecture a different strategy is best.
At a quad opteron dual core for example, diep's approach of using
local hashtables is not really the optimal idea.
A TLB trashing lookup to a remote processor is on average 234 ns,
a local lookup is like 147 ns. So the difference is just 70 ns.
This at 1.8Ghz is not much of a penalty.
So there is more possible there in theory, but it all is dependant upon the program in question and the hardware you intend to run it at.
Vincent