Update: Since 01/13/2011, all programs have been added to my github repository. The master copies will be kept there.
- Non-numerical stuff
- My C macro library:
- kbtree.h (HTML): efficient B-tree library in C.
- khash.h (HTML): fast and light-weighted hash table library in C. See also this post and this one.
- ksort.h (HTML): fast and convenient library for sorting and k-small algorithms in C. See also this post.
- kstream.h (HTML): a simple generic buffered stream wrapper.
- kvec.h (HTML): simple vector container in C. Benchmark codes: kvec_test.cc. See also this post.
- Source code for benchmark on unordered dictionary data structures. See this page for the result.
- My C++ template library. Note that I have shifted my focus on C macro library and the template library is not actively maintained. You can easily find good implementations in SGI STL. However, from the benchmarks you can see that my library still has some advantages. If you like to use the template library and find bugs or require more features, please feel free to contact with me.
- High-performance hash template library: khash.hh (HTML). Benchmarking programs are also available: hash_test.cc (HTML) and time_hash_map.cc (HTML). See also this post.
- Sort template library: sort.hh (HTML) and sort_test.cc (HTML). This library includes high-performance iterative introsort (i.e. improved quicksort), iterative bottom-up mergesort and binary heapsort. See also this post.
- Simple memory management by Kernighan & Ritchie: kalloc.h and kalloc.c (HTML or PDF). See also this post.
- GNU sort for string-number mixed comparison: sort-1.22a.tar.bz2. This version also includes some performance enhancement. See also this post.
- My C macro library:
- Numerical algorithms
- Normal and Gamma random generator: genran.h and genran.c (HTML). Adapted from GSL.
- Hooke-Jeeves’ method for derivative-free nonlinear programming: min.hh (HTML). This is a C++ template function. See also this post.
- NEWUOA method for derivative-free nonlinear programming: newuoa.hh (HTML). C++ template function. Adapted from Powell’s Fortran program. A benchmark on this method is available here. See also this post.
hermano perdona que te escriba en español, mi ingles es suficiente para leer, tengo una implemetacion de avl, que modifique para trabajar con el como si fuera una lista, ademas que permitiera elementos repetidos, “me encanto” tu comparacion del b-tree con los los binarios, voy a usar tu implementacion del b-tree en mi trabajo(por supuesto, le voy a dar difusion a tu sitio, y incluir en la bibliografia), ahora la pregunta es, cual es la mejor implementacion que conoces de AVL?, tienes alguna tuya…?, estoy desarrollando un libreria evolutiva paralela, y la poblacion pienso almacenarla en arboles, por la rapides de insercion y busqueda, como te dije modifique un AVL, pero como explicas puede existir situaciones en las que sea mejor usar un AVL y otras donde sea mejor usar un b-tree, mira disculpa la charla pero, este tema me es realmente interesante, estuve buscando implemetaciones de las demas versiones de b-tree, b+tree, b*tree, dancing b*tree, pero de estas me encontre pocas tan elegante y sobre todo rapidas que el kbtree, mis felicitaciones y agradecimiento, espero con expectativas tu respuesta. Saludos Leander.
@leander
I do not speak Spanish, but I can use google translate. Hope the translation honestly reflects what you mean to say.
There are few good AVL implementations, but there are a lot of good Red-black tree libraries. In my benchmark (“Another look at my old benchmark”), AVL and RB tree have quite similar performance in terms of both speed and memory. And therefore, if you like to use binary search tree (BST), you might want to use RB tree instead. For C++ programmers, STL’s implemention is nearly the best. For C programmers, I recommend tree.h from FreeBSD. See my post (“Comparison of binary search trees (BSTs)”) for more details.
Although in my benchmark, B-tree marginally outperforms RB tree, in other applications, RB/AVL tree may be faster. More advanced implementation of RB may also use less memory than my B-tree. But this all depends on the type of data in the BST/B-tree.
I have not tested other B-tree (B*/B+) implementations. But if you want to use an on-disk B-tree for large data set, I would highly recommend SQLite. It is faster than another B-tree library and is much more flexible.
thank so much.
why don’t a document to know how use yours libraries? Is hard read and understand all code for use it
Hey, really nice posts!
If only every numerical algorithm paper included a C or C++ 1-header implementation. It would be a better world.
Hi, cool web site, just want to ask you what filtering program you use for comments because I am getting tons on my web site.
Sorry if I can’t write better. I’m very bad at English.
I want to ask you some question.
If I have a record structure, such as a string and an integer number inside it, can I use your khash library to implement a hash table?
If I only have to implement a hash table with key is a string (or an integer), and value is nothing, can I order them?
Hi! First of all thank you very much for your hash implementation. Surprisingly it appeared that only few of good ones, written in pure C, are available freely over Internet.
I want to add collisions counting to your implementation, but I’m afraid I didn’t understand algorithm fully. Could you explain, for what “empty” and “deleted” buckets are distinguished? Why “deleted” buckets will be skipped during kh_put_…() operation?
Thanks!
@PM
You can put a struct as key, but you have to define your own hash function and tell the library how to regard two objects are equal.
@Bear Vodka
I think I had a reason to implement it in that way, but I have totally forgotten the reason. Sorry.
regarding using a struct as a key. I’m having difficulty with this. I have
struct _pair {
int32_t key;
int32_t value;
};
typedef struct _pair key_value;
#define kb_pair_cmp(a, b) (((a.key) > (b.key)) – ((a.key < (b.key)))
KBTREE_INIT(pair, key_value, kb_pair_cmp)
I get GCC compile errors.
mac_to_oui.c: In function ‘__kb_getp_aux_pair’:
mac_to_oui.c:25: error: request for member ‘key’ in something not a structure or union
mac_to_oui.c:25: error: request for member ‘key’ in something not a structure or union
mac_to_oui.c:25: error: expected ‘)’ before ‘begin’
mac_to_oui.c:25: error: expected expression before ‘}’ token
mac_to_oui.c:25: error: request for member ‘key’ in something not a structure or union
mac_to_oui.c:25: error: request for member ‘key’ in something not a structure or union
mac_to_oui.c:25: error: expected ‘)’ before ‘begin’
mac_to_oui.c:25: error: expected expression before ‘}’ token
mac_to_oui.c: In function ‘__kb_putp_aux_pair’:
mac_to_oui.c:25: error: request for member ‘key’ in something not a structure or union
mac_to_oui.c:25: error: request for member ‘key’ in something not a structure or union
mac_to_oui.c:25: error: expected ‘)’ before ‘i’
mac_to_oui.c:25: error: expected expression before ‘}’ token
Idears?
self answer
change
#define kb_pair_cmp(a, b) (((a.key) > (b.key)) – ((a.key (b).key) – ((a).key < (b).key)
May I ask why you use ‘k’ as prefix for each algorithm? e.g. khash, kvec…
May I ask why all your library codes use prefix ‘k’, e.g. khash, kvec ?
About the use of klist.
May I ask
(1) what is the use of defining __int_free?
#define __int_free(x)
KLIST_INIT(32, int, __int_free)
I see in the macro code of klist. Somewhere in KMEMPOOL_INIT will call:
kmpfree_f(mp->buf[k]); free(mp->buf[k]);
But I don’t understand what kmpfree_f is freeing while there is a standard free() operation follows.
(2) if my list is not using integer but string or other kinds of pointer, does it work with klist? Can you show me a simple example of how KLIST_INIT() should be used in this case?
Thanks a lot for clarification.
May I also ask another question?
For klist.h, is it whenever I declare/init a new list, there will be a new set of functions kl_pushp, kl_shift, etc, suffixed by the input list name, declared under the init macro?
If so, I wonder if there will be adverse effect on code size.
In my case, I could use a lot of lists which are being parts of some data structures representing some special objects. If there are many such objects, e.g. 100000, then will there be 100000 sets of functions for each list in the code? If yes, then I think it’s not suitable for using klist in my case.
Thanks a lot.
Nice job!
Can I ask another question ?
For kbtree.h, it seems that I can only create one btree instance at once. Have you tried initializing more than one instance ?
Thanks.
I have not tried to create two b-tree instances, but I think you should be able to do that. There are essentially no global variables or structs. Could you show me your code on something like gist?
Great post. I was exploring to use Boost Inrusive hashtable. Is it not worth trying at all? I am familiar with Boost Library and I thought I would use that and surprised to see so many other open implementations.
I do not write C++ programs often. I almost never use boost except for benchmarking. Have someone compared intrusive hashtable and the standard hashtable?
[…] /status的程序,并给出tmem的下载链接 【5】Understanding memory,非常详细的介绍Linux内存分配,优化 【6】Heng Li’s personal website 可以下载比较hash library的程序,里面有一个runint/文… […]
Thank you for making your library freely available.
I am hoping you can help me resolve issues with khash (0.2.7). When called from within another library, khash overwrites keys that already exist in some buckets. In a specific instance, string elements added within a loop are:
“00000000”: k = 0
“11100000”: k = 191
“00000100”: k = 961
“00101100”: k = 191 **
This is how I use the functions:
khash_t(msly) *t = kh_init(msly);
Within a loop:
{
printf(“*****************************\n”);
printf(“New entry: %d\n”, i);
printf(“R1\n”);
ivec_print(r1);
printf(“C1\n”);
ivec_print(c1);
printf(“\n”);
printf(“bitarrays and string\n”);
ms2 = (char*) bitarray2_to_string(r1, m, c1, n);
printf(“\n”);
printf(“Adding string to t\n”);
k0 = kh_get(msly, t, ms2);
is_missing = (k0 == kh_end(t));
k1 = kh_put(msly, t, ms2, &ret);
if (ret) {
idx = t->size – 1;
kh_value(t, k1)= idx;
} else {
idx = t->vals[k1];
}
printf(“k0: %d, was_missing: %d\n”, k0, is_missing);
printf(“k1: %d, ret: %d\n”, k1, ret);
key = (char*) t->keys[k1];
printf(“Value: %d, Retrieved key is: %s\n”, idx, key);
printf(“*****************************\n”);
free(ms2), ms2=NULL;
}
Output:
*****************************
*****************************
First element to be added is: 00000000
k: 0, ret: 1
Value: 0, Retrieved key is: 00000000
*****************************
New entry: 0
R1
printing vector of length: 3
i: 0 :: val: 3
i: 1 :: val: 4
i: 2 :: val: 5
C1
printing vector of length: 2
i: 0 :: val: 0
i: 1 :: val: 1
bitarrays and string
Bitarray 1
bitarray: 111000
Bitarray 2
bitarray: 00
2dstring: `11100000`
Adding string to t
k0: 1024, was_missing: 1
k1: 191, ret: 1
Value: 1, Retrieved key is: 11100000
*****************************
*****************************
New entry: 1
R1
printing vector of length: 5
i: 0 :: val: 0
i: 1 :: val: 1
i: 2 :: val: 2
i: 3 :: val: 3
i: 4 :: val: 4
C1
printing vector of length: 2
i: 0 :: val: 0
i: 1 :: val: 1
bitarrays and string
Bitarray 1
bitarray: 000001
Bitarray 2
bitarray: 00
2dstring: `00000100`
Adding string to t
k0: 1024, was_missing: 1
k1: 961, ret: 1
Value: 2, Retrieved key is: 00000100
*****************************
*****************************
New entry: 2
R1
printing vector of length: 3
i: 0 :: val: 0
i: 1 :: val: 1
i: 2 :: val: 3
C1
printing vector of length: 2
i: 0 :: val: 0
i: 1 :: val: 1
bitarrays and string
Bitarray 1
bitarray: 001011
Bitarray 2
bitarray: 00
2dstring: `00101100`
Adding string to t
k0: 191, was_missing: 0
k1: 191, ret: 0
Value: 1, Retrieved key is: 00101100
*****************************
Please ignore the post. I found the error:
free(ms2), ms2=NULL;
I was freeing the key!
Compares khash against Google DenseHash, C++ unordered_map, LuaJIT, Java HashMap: http://lonewolfer.wordpress.com/2014/03/17/benchmarking-hash-table-implementations-part-2/
[…] solution is Attractive Chaos sotware. C macro library: kbtree.h: efficient B-tree library in C. khash.h: fast and light-weighted hash […]
[…] Attractive Chaos implement kbtree.h. It’s a efficient B-tree library […]
[…] solution is Attractive Chaos software. C macro library: kbtree.h: efficient B-tree library in C. khash.h: fast and light-weighted hash […]
[…] 다른 해결책은 매력적인 카오스 소프트웨어 입니다. C 매크로 라이브러리 : kbtree.h : C의 효율적인 B- 트리 라이브러리. […]
[…] attractive chaos ‘s […]
[…] solution is Attractive Chaos software. C macro library: kbtree.h: efficient B-tree library in C. khash.h: fast and light-weighted hash […]