Hi Zach
Yeah I noticed that as well after I posted, and I've come to the same conclusion - it's the hash size of course because the bits are also used for indicing.
It fits rather well actually, because if I use a larger hash more bits seem to be needed than with a smaller size hash.
I ran that test too
Very small hash (2 MB) ~40 bits needed
- Code: Select all
1th bit collisions = 84036096
2th bit collisions = 83825685
3th bit collisions = 83718933
4th bit collisions = 83664898
5th bit collisions = 83638613
6th bit collisions = 83625887
7th bit collisions = 83618982
8th bit collisions = 83615744
9th bit collisions = 83613960
10th bit collisions = 83613175
11th bit collisions = 83612718
12th bit collisions = 83612561
13th bit collisions = 83612449
14th bit collisions = 41776602
15th bit collisions = 20889139
16th bit collisions = 10440363
17th bit collisions = 5225158
18th bit collisions = 2622300
19th bit collisions = 1313023
20th bit collisions = 658057
21th bit collisions = 328700
22th bit collisions = 163969
23th bit collisions = 81900
24th bit collisions = 41173
25th bit collisions = 20766
26th bit collisions = 10125
27th bit collisions = 5241
28th bit collisions = 2710
29th bit collisions = 1308
30th bit collisions = 587
31th bit collisions = 290
32th bit collisions = 111
33th bit collisions = 61
34th bit collisions = 29
35th bit collisions = 18
36th bit collisions = 7
37th bit collisions = 3
38th bit collisions = 1
39th bit collisions = 1
40th bit collisions = 1
41th bit collisions = 0
42th bit collisions = 0
...
total nodes 20424661
Very large hash (200 MB) ~44 bits needed
- Code: Select all
1th bit collisions = 68011053
2th bit collisions = 47220104
3th bit collisions = 36825586
4th bit collisions = 31614457
5th bit collisions = 29016929
6th bit collisions = 27717794
7th bit collisions = 27067777
8th bit collisions = 26740790
9th bit collisions = 26579642
10th bit collisions = 26499701
11th bit collisions = 26459037
12th bit collisions = 26438309
13th bit collisions = 26427893
14th bit collisions = 26422708
15th bit collisions = 26420243
16th bit collisions = 26418793
17th bit collisions = 26418202
18th bit collisions = 26417908
19th bit collisions = 26417722
20th bit collisions = 26417644
21th bit collisions = 13227232
22th bit collisions = 6612346
23th bit collisions = 3306734
24th bit collisions = 1654428
25th bit collisions = 824725
26th bit collisions = 410876
27th bit collisions = 205487
28th bit collisions = 103168
29th bit collisions = 51017
30th bit collisions = 25410
31th bit collisions = 12781
32th bit collisions = 6026
33th bit collisions = 3056
34th bit collisions = 1552
35th bit collisions = 787
36th bit collisions = 390
37th bit collisions = 206
38th bit collisions = 89
39th bit collisions = 50
40th bit collisions = 27
41th bit collisions = 8
42th bit collisions = 4
43th bit collisions = 3
44th bit collisions = 1
45th bit collisions = 1
46th bit collisions = 0
47th bit collisions = 0
...
total nodes 26776742
Extrapolating from this I found, that at current CPU speeds and hash sizes we can expect it to take about 60-66 days for one 64 bit collision to occur.
But I don't think 64 bits will ever be insuffcient for this. Even if computers were 1000 times faster than today and had a handful of collisions an hour, the ratio of colliding nodes / good nodes would still be a very small number.
-S