Moderator: Andres Valverde
int Lock; // 32-bits of the hash key not used for index
char From; // From square, + 2 bits to indicate bound type
char To; // To Square + 2 bits to indicate special moves
short int Score; // 16 bit
char Depth; // Draft of the entry
If you acces one byte, all 64 are read from memory in a burst transfer. As long as it stays in cache, you won't need any new memory accesses, for accessing this entry or any of the neighboring entries in the same cache line.
You must realize that my replacement already overwrites depth-prefferred entries with lower depths during the same search, if there are too many of that depth (draft).
int i = (int) malloc(SIZE+64);
i = i+63 & ~63;
HashTable = (struct _HashEntry *) i;
char Table[SIZE*sizeof(struct _HashEntry)+64];
#define HashTable ((struct _HashEntry *) (Table+CONSTANT))
...
if(((int) HashTable) & 63)
{ printf("error, hash start = %x\n",(int) HashTable);
exit(0);
}
H.G.Muller wrote:On my machine (a 1.3GHz Celeron M) a cache-line fill takes 286 CPU clocks.
#define SIZE (1<<24) /* 16MB */
char mem[SIZE];
int sum;
main()
{
clock_t t;
int i, p;
t = clock();
for(i=1e9; i>0; i--)
{
sum += mem[p&(SIZE-1)];
mem[p&(SIZE-1)] = i; /* make dirty; delete for clean timing */
p += 64;
}
t = clock() - t;
...
}
SIZE meaning read+write read
13 CPU+L1 4.9 1.9
19 above+L2 5.5 3.8
28 above+mem 34.2 15.7
H.G.Muller wrote:I don't understand your numbers, what you mention in the text doesn't seem to correspond to your table.
If you would time 2.8e9 writes, seconds would directly correspond to cycles, which would make life easier.
Printing sum at the end might help to silence smart-ass optimizers.
#include <iostream>
#include <ctime>
#define SIZE (1<<28)
char mem[SIZE];
int sum;
main()
{
int p;
for(p=0; p<SIZE; p+=64)
mem[p] = p;
clock_t t = clock();
for(long long int i=1000000000ULL; i>0; i--)
{
sum += mem[p&(SIZE-1)];
// mem[p&(SIZE-1)] = i;
p += 64;
}
t = clock() - t;
std::cout << "Time: " << t/1000000.0 << std::endl;
std::cout << "sum: " << sum << std::endl;
}
It might be a good idea to pre-fill the memory array with something, before starting the timing test. Especially Linux sometimes behaves funny when reading uninitialized data (it is all aliased to the same physical 4K RAM page).
Your RAM access seems very fast. I guess we are timing fast page-mode access. Perhaps it would be more representative to take strides of 4096+64, rather than 64 bytes, so that every access falls into a new DRAM page.
SIZE r r+w
13 5.36 13.6
18 7.39 15.5
19 31.6 34.6
28 114.9 179.9
Daniel Shawul wrote:Okay i will try 64 byte (i made a typo in my previous post) alignment and do the test again.
This is my hashtable entry.
struct HASH_ENTRY {
uint64 lock;
int32 move;
int16 score;
uint8 depth;
uint8 flag;
}
When probing the first byte to be accessed is the lock, so no problems i guess. But when storing the depth is checked first, so it means (score,depth & flag) are in cache but not lock & move. This will require additional memory look up?
Looking at your tt entries, I see that you store only 32 bits for the lock. Dont remever why I store the whole 64 bit May be because it is what i got from Bruce moreland's tutorial.. In your case i guess you use the upper or lower half as a key right?
I think you should have the 'age' counter in your entry. Or otherwise you have to clear the transpostion table after a move?
Testing on a set of testpositons really doesnt measure the performance of a replacement scheme IMO. However i do not think that every transpositon table entry from an old search is unimportant. In the most common hash table, the prefer depth entry is replaced when the depth is greater, or when the position is from previous search. I dont totally agree of the later, because the search might be from an immediate former search and might have a large depth.
Return to Programming and Technical Discussions
Users browsing this forum: No registered users and 38 guests