Speeding up khash with a power-of-2 bucket count

September 16, 2011 by attractivechaos

Zilong Tan has kindly shared his observation that using a power-of-2 bucket count may dramatically improve the speed for integer keys. He implemented this idea in his ulib library. He is right. Here is a tiny benchmark done on my laptop (Intel Core 2 Duo; Mac OS X 10.6; gcc 4.2.1; program khash_test.c).

Version	Int-set-5M (sec)	Str-set-5M	Int-set-4M	Int-set-6M
0.2.5	0.898	2.174	0.593	1.327
0.2.6	0.476	1.814	0.285	0.664
0.2.6 linear	0.333	1.744	0.250	0.455

where “0.2.6” is the new version and “0.2.6 linear” indicates that linear probing is in use (by default, khash uses double hashing for stability). We can see that 0.2.6 is nearly twice as fast as 0.2.5 for integer hash set. Linear probing is even faster than the default double hashing due to the better cache performance and a little fewer CPU instructions, but double hashing tends to be more robust to certain non-random input.

The new version is available here.

Update: Note that due to rehashing the performance of a hash table as a function of input size is not a smooth curve. Rehashing may take time. Thus I added two more data points to show that the speedup is not due to this caveat.

Posted in Uncategorized | 25 Comments

25 Responses

on September 16, 2011 at 5:41 pm | Reply cbloom

I’ve written extensively about ways khash could be improved :

http://cbloomrants.blogspot.com/2008/10/10-21-08-2.html

http://cbloomrants.blogspot.com/2008/10/10-21-08-3.html

http://cbloomrants.blogspot.com/2010/11/11-29-10-useless-hash-test.html

The most glaring one is that the separate arrays for flags,keys & data are very bad for cache.
- on September 16, 2011 at 5:56 pm | Reply attractivechaos
  
  I think I will reconsider about “mod prime” and using special keys for “DELETED/EMPTY” instead of using a flag array. However, I should comment that the separation of flags, keys and values is intended and was discussed in my previous blog posts (post on google dense hash table). The key reason is in addition to time efficiency khash also strives for memory efficiency. If we put keys and values in one C struct, we may waste memory due to memory alignment in a struct (khash was first used for keeping 8-byte key and 1-byte value; 7 bytes wasted per bucket when we put a key and a value in the same struct). In addition, one can always store “struct {int key,val;}” in the hash table and hash “key” only to avoid a cache miss.
  
  A separate flags array reduces memory when rehashing due to the kick-out process implemented in khash. With the traditional method, we keep the old buckets and insert them to the new one. Khash only keeps the old flags array instead of the old buckets.
  
  I was also aware of the advantage of caching the hashed values (post on rde::hash_map), especially for string keys. I did not implement this again for a smaller memory footprint. And frequently I do not need a very fast string hash table in my own applications.
on September 16, 2011 at 6:17 pm | Reply Nils Homer

So are you updating BWA with your new khash?
- on September 16, 2011 at 6:35 pm | Reply attractivechaos
  
  I can, but it does not really matter. Negligible CPU time goes to hash table operations. Samtools probably does not need a fast hash table, either.
on September 17, 2011 at 4:12 pm | Reply noob

OK, maybe here there is something i don’t really get.
What’s the point of this khash algorithm ? I mean, hashing an input into a figure is hardly something new nor difficult. So what makes khash special and worthwhile ? Is that size efficiency ?
- on September 17, 2011 at 6:08 pm | Reply attractivechaos
  
  Khash is one of the most efficient generic hash table libraries in C/C++, in terms of both time and space efficiency. It is also among the shortest: everything in one header file in ~250 coding lines (core algorithm in 150 lines) with no dependency to other libraries except libc. Yes, it is not hard to write a 250-line program. You can write your own.
on September 17, 2011 at 11:52 pm | Reply noob

Well, actually, i do.
I’ve been looking into your code, and although single-file long, i do not find it easy to read. There are many clever pre-processing tricks, which make it difficult to catch at first glimpse.
I’m writing hash-related stuff fairly usually, so that’s why i don’t see what’s difficult in it. But reading the comments, i think there is something i don’t get. Maybe it is the word “generic”, which may hide some kind of complexity i’m not aware of.
- on December 14, 2011 at 3:27 am | Reply zhanxw
  
  I just compared khash on my side, and it is definitely one of the fastest hash library you can get. Implementing a hash lib is not hard but it is hard to get a good one.
  The preprocessing trick of khash.h is not trivial.
on October 3, 2011 at 11:20 pm | Reply k.os tao

Great code. I have had success using khash for a small number of relatively large hashes here and there. At present I am trying to make a large array of hashes or, similarly, embedding a khash in a struct, and making a large array of structs. If someone could please post a simple example of how this could be (elegantly) achieved using khash, it would be greatly appreciated.
- on October 4, 2011 at 2:40 am | Reply attractivechaos
```
int i, N = 100;
khash_t(foo) **hashes;
hashes = calloc(N, sizeof(void*));
for (i = 0; i < N; ++i) hashes[i] = kh_init(foo);
```
on October 5, 2011 at 4:45 am | Reply k.os tao

Many thanks for the prompt response… testing the approach out now. Question: For hashing a known dataset, how does khash compare to gperf, cmph in terms of speed and efficiency?
- on October 6, 2011 at 1:32 am | Reply attractivechaos
  
  Gperf and cmph are for perfect hashing, while khash is for generic dynamic hash table. These are not the same thing.
on January 10, 2012 at 4:08 am | Reply chang

how does khash handle collision? separating chaining or open address?
- on January 10, 2012 at 4:17 am | Reply attractivechaos
  
  Open addressing.
on January 20, 2012 at 6:25 am | Reply Jared

First, thank you for taking the time to write this awesome hash and giving it freely.

I am noticing a weird bahvior when using kh_resize(32, h, theSize) before any keys are entered.

Depending on the value of theSize (which is an int), every key put into h will apear as present [using z=kh_get(32, h, k); kh_exist(h, z)] even if nothing has been inserted before.

Sometimes it behaves as expected as kh_exist() returns 0 and other times incorrectly returning 1, again depending on the value of theSize.

I have a reproducible example if that will help things make more sense.

Thank you for any help you can provide. I look forward to using your tool properly.
- on January 20, 2012 at 6:31 am | Reply attractivechaos
  
  Do you have an example? Thanks.
  - on January 20, 2012 at 4:48 pm Jared
    
    Sure. Do you want me to post it here or send through email?
  - on January 21, 2012 at 1:40 am attractivechaos
    
    Better a gist at github, or you may post it here. Thank you. The problem you were describing could happen.
  - on January 24, 2012 at 6:10 pm Jared
    
    Any luck with my example? I’m perplexed by it, then again, my skills at this aren’t as high as yours.
on January 22, 2012 at 3:16 am | Reply Jared

I’ve never done a gist, but I put the code on github: https://github.com/jaredlander/join/blob/master/src/mainhash.cpp

Thanks for checking it out.
on May 29, 2013 at 3:53 pm | Reply simonhf

I wrote a quick benchmark and tested it against AC_VERSION_KHASH_H “0.2.4” and AC_VERSION_KHASH_H “0.2.8” with the following results:

$ gcc -O3 khash-example.c && time ./a.out 32000000
— generating data… done!
[ht_khash_str_put] size: 32000000
[ht_timing] 11.340 sec or 2821869.5 per second
[ht_khash_str_get] size: 32000000
[ht_timing] 4.970 sec or 6438631.8 per second

$ gcc -O3 khash-example.c && time ./a.out 32000000
— generating data… done!
[ht_khash_str_put] size: 32000000
[ht_timing] 41.160 sec or 777453.8 per second
[ht_khash_str_get] size: 32000000
[ht_timing] 3.490 sec or 9169054.4 per second

Conclusion: Insert time has become horrible 😦 Is this to be expected for some reason?

Just for fun I removed inlining from the last test and repeated it:

$ gcc -O3 khash-example.c && time ./a.out 32000000
— generating data… done!
[ht_khash_str_put] size: 32000000
[ht_timing] 41.410 sec or 772760.2 per second
[ht_khash_str_get] size: 32000000
[ht_timing] 18.090 sec or 1768933.1 per second

Conclusion: On modern CPUs inlining makes almost no difference?
- on May 29, 2013 at 4:06 pm | Reply attractivechaos
  
  What keys are you inserting? What is the hash function? Are they random enough? I guess what you are getting is caused by collision. The latest version at github uses quadratic probing, which has better cache efficiency but may have worse performance given lots of collisions. If your keys and hash function are good enough and you still get worse performance, I will probably revert back to double hashing. Thanks.
  - on May 30, 2013 at 6:19 pm simonhf
    
    Unique 16 char hex strings. I haven’t changed the hash function which is built into khash. Should I? Have you looked at MurmurHash3 for hashing strings? I also noticed that the newer version of khash caused the process to grow to 1.9g instead of 1.8g. Not a big deal I guess.
  - on May 31, 2013 at 11:51 pm attractivechaos
    
    Do you have the source code somewhere? I will look into this. If quadratic probing really hurts the performance, I will switch back to double hashing. Thanks! As to the memory, the older version doubles the table size when it is 77% full. The new version doubles when 75% full.
  - on June 1, 2013 at 9:07 pm simonhf
    
    Source code is here: https://gist.github.com/anonymous/5691733

Comments RSS

	humbleoptimizer on Using void* in Generic C Progr…
	Accessed on January… on My programming language benchm…
	Sumit Kumar on Implementing Generic Hash Libr…
	Implementing a HashM… on khash.h
	Implementing a HashM… on khash.h

Attractive Chaos

Just another WordPress.com weblog