<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Attractive Chaos &#187; C</title>
	<atom:link href="http://attractivechaos.wordpress.com/tag/c/feed/" rel="self" type="application/rss+xml" />
	<link>http://attractivechaos.wordpress.com</link>
	<description>Just another WordPress.com weblog</description>
	<lastBuildDate>Tue, 29 Sep 2009 22:22:13 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='attractivechaos.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/3aaf4ad34bfdf87dcbb70d9e3cbd326d?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>Attractive Chaos &#187; C</title>
		<link>http://attractivechaos.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://attractivechaos.wordpress.com/osd.xml" title="Attractive Chaos" />
		<item>
		<title>A Generic Buffered Stream Wrapper</title>
		<link>http://attractivechaos.wordpress.com/2008/10/11/a-generic-buffered-stream-wrapper/</link>
		<comments>http://attractivechaos.wordpress.com/2008/10/11/a-generic-buffered-stream-wrapper/#comments</comments>
		<pubDate>Sat, 11 Oct 2008 21:20:29 +0000</pubDate>
		<dc:creator>attractivechaos</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[myprog]]></category>

		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=629</guid>
		<description><![CDATA[In C programming, the main difference between low-level I/O functions (open/close/read/write) and stream-level I/O functions (fopen/fclose/fread/fwrite) is that stream-level functions are buffered. Presumably, low-level I/O functions will incur a disk operation on each read(). Although the kernel may cache this, we cannot rely too much on it. Disk operations are expensive and so low-level I/O [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=629&subd=attractivechaos&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>In C programming, the main difference between low-level I/O functions (open/close/read/write) and stream-level I/O functions (fopen/fclose/fread/fwrite) is that stream-level functions are buffered. Presumably, low-level I/O functions will incur a disk operation on each read(). Although the kernel may cache this, we cannot rely too much on it. Disk operations are expensive and so low-level I/O does not provide fgetc equivalent.</p>
<p>Stream-level I/O functions have a buffer. On reading, they load a block of data from disk to memory. If at a fgetc() call the data have been retrieved to memory, it will not incur a disk operation, which greatly improves the efficiency.</p>
<p>Stream-level I/O functions are part of the standard C library. Why do we need a new wrapper? Three reasons. First, when you work with an alternative I/O library (such as zlib or libbzip2) which do not come with buffered I/O routines, you probably need a buffered wrapper to make your code efficient. Second, using a generic wrapper makes your code more flexible when you want to change the type of input stream. For example, you may want to write a parser that works on a normal stream, a zlib-compressed stream and on a C string. Using a unified stream wrapper will simplify coding. Third, my feeling is most of steam-level I/O functions in stdio.h are not conventient given that they cannot enlarge a string automatically. In a lot of cases, I need to read one line but I do not know how long a line can be. Managing this case is not so hard, but doing this again and again is boring.</p>
<p>In the end, I come up with my own buffered wrapper for input streams. It is generic in that it works on all types of I/O steams with a read() call (or equivalent), or even on a C string. I show an example here without much explanation. I may expand this post in future. Source codes can be found in my <a href="http://attractivechaos.wordpress.com/programs/">programs page</a>.</p>
<pre class="brush: cpp;">
#include &lt;fcntl.h&gt;
#include &lt;unistd.h&gt;
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &quot;kstream.h&quot;
// arguments: type of the stream handler,
//   function to read a block, size of the buffer
KSTREAM_INIT(int, read, 10)

int main()
{
	int fd;
	kstream_t *ks;
	kstring_t str;
	bzero(&amp;str, sizeof(kstring_t));
	fd = open(&quot;kstream.h&quot;, O_RDONLY);
	ks = ks_init(fd);
	while (ks_getuntil(ks, '\n', &amp;str, 0) &gt;= 0)
		printf(&quot;%s\n&quot;, str.s);
	ks_destroy(ks);
	free(str.s);
	close(fd);
	return 0;
}
</pre>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/attractivechaos.wordpress.com/629/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/attractivechaos.wordpress.com/629/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/attractivechaos.wordpress.com/629/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/attractivechaos.wordpress.com/629/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/attractivechaos.wordpress.com/629/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/attractivechaos.wordpress.com/629/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/attractivechaos.wordpress.com/629/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/attractivechaos.wordpress.com/629/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/attractivechaos.wordpress.com/629/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/attractivechaos.wordpress.com/629/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=629&subd=attractivechaos&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://attractivechaos.wordpress.com/2008/10/11/a-generic-buffered-stream-wrapper/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/047ebc7bb9ff37a0da844413856e92cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">attractivechaos</media:title>
		</media:content>
	</item>
		<item>
		<title>Another Look at my old Benchmark</title>
		<link>http://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/</link>
		<comments>http://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/#comments</comments>
		<pubDate>Tue, 07 Oct 2008 12:02:47 +0000</pubDate>
		<dc:creator>attractivechaos</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[cpp]]></category>
		<category><![CDATA[myprog]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=586</guid>
		<description><![CDATA[This is a follow-up of my previous post. Here I change the table to several charts. Hope it seems more friendly to readers. You can find the links to these libraries in that table. Their source codes, including my testing code, are available here. You may also want to see my previous posts in the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=586&subd=attractivechaos&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>This is a follow-up of my previous post. Here I change <a href="http://attractivechaos.awardspace.com/udb.html">the table</a> to several charts. Hope it seems more friendly to readers. You can find the links to these libraries in that table. Their source codes, including my testing code, are available <a href="http://attractivechaos.awardspace.com/download/udb-latest.tar.bz2">here</a>. You may also want to see my previous posts in the last few days for my interpretation to the results.</p>
<p>On C string (char*) keys, I fail to use JE_rb_old and JE_rb_new to get the correct result on Mac and so they are not showed in the charts. I would really appreciate if someone may give me the correct implementation using these libraries. In addition, tr1_unordered_map uses a lot of memory according to my program. The memory for string keys are faked.</p>
<p>For conveniece, here are some brief descriptions of these libraries (with no order):</p>
<ul>
<li>google_dense and google_sparse: <a href="http://code.google.com/p/google-sparsehash/">google&#8217;s sparsehash library</a>. Google_dense is fast but memory hungery while google_sparse is the opposite.</li>
<li>sgi_hash_map and sgi_map: <a href="http://www.sgi.com/tech/stl/">SGI&#8217;s STL</a> that comes with g++-4. The backend of sgi_map is a three-pointer red-black tree.</li>
<li>tr1::unordered_map: GCC&#8217;s TR1 library that comes with g++-4. It implements a hash table.</li>
<li>rdestl::hash_map: from <a href="http://code.google.com/p/rdestl/">RDESTL</a>, another implementation of STL.</li>
<li><a href="http://uthash.sourceforge.net/">uthash</a>: a hash library in C</li>
<li>JG_btree: <a href="http://resnet.uoregon.edu/~gurney_j/jmpc/btree.html">John-Mark Gurney&#8217;s btree library</a>.</li>
<li>JE_rb_new, JE_rb_old, JE_trp_hash and JE_trp_prng: <a href="http://www.canonware.com/~ttt/2008/07/treaps-versus-red-black-trees.html">Jason Evans&#8217; binary search tree libraries</a>. JE_rb_new implements a left-leaning red-black tree; JE_rb_old a three-pointer red-black tree; both JE_trp_hash and JE_trp_prng implement treaps but with different strategies on randomness.</li>
<li>libavl_rb, libavl_prb, libavl_avl and libavl_bst: from <a href="http://www.stanford.edu/~blp/avl/">GNU libavl</a>. They implment a two-pointer red-black tree, a three-pointer red-black tree, an AVL tree and a unbalanced binary search tree, respectively.</li>
<li>NP_rbtree and NP_splaytree: <a href="http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/sys/tree.h">Niels Provos&#8217; tree library</a> for FreeBSD. A three-pointer red-black tree and a splay tree.</li>
<li>TN_rbtree: <a href="http://www.darkridge.com/~jpr5/archive/alg/node21.html">Thomas Niemann&#8217;s red-black tree</a>. I ported it to C++.</li>
<li>sglib_rbtree: from <a href="http://sglib.sourceforge.net/">SGLIB</a>. It implements a two-pointer recursive red-black tree (all the other binary search trees are implemented without recursion).</li>
<li>libavl_avl_cpp, libavl_rb_cpp and libavl_rb_cpp2: incomplete C++ version of libavl (no iterator), ported by me. Libavl_rb_cpp2 further uses the same technique in JE_rb_new to save the color bit. Source codes available in the package.</li>
<li><a href="http://attractivechaos.awardspace.com/khash.h.html">khash</a> and <a href="http://attractivechaos.awardspace.com/kbtree.h.html">kbtree</a>: my hash table and B-tree implementation. kbtree is based on JG_rbtree.</li>
</ul>
<p><a href="http://klib.sourceforge.net/images/udb-int-cpu.png"><img class="alignnone size-full wp-image-622" title="udb-int-cpu" src="http://klib.sourceforge.net/images/udb-int-cpu.png" alt="" width="542" height="309" /></a></p>
<p><a href="http://klib.sourceforge.net/images/udb-int-mem.png"><img class="alignnone size-full wp-image-623" title="udb-int-mem" src="http://klib.sourceforge.net/images/udb-int-mem.png" alt="" width="542" height="309" /></a></p>
<p><a href="http://klib.sourceforge.net/images/udb-str-cpu.png"><img class="alignnone size-full wp-image-624" title="udb-str-cpu" src="http://klib.sourceforge.net/images/udb-str-cpu.png" alt="" width="542" height="308" /></a></p>
<p><a href="http://attractivechaos.files.wordpress.com/2008/10/udb-str-mem.png"><img class="alignnone size-full wp-image-625" title="udb-str-mem" src="http://klib.sourceforge.net/images/udb-str-mem.png" alt="" width="543" height="309" /></a></p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/attractivechaos.wordpress.com/586/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/attractivechaos.wordpress.com/586/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/attractivechaos.wordpress.com/586/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/attractivechaos.wordpress.com/586/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/attractivechaos.wordpress.com/586/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/attractivechaos.wordpress.com/586/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/attractivechaos.wordpress.com/586/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/attractivechaos.wordpress.com/586/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/attractivechaos.wordpress.com/586/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/attractivechaos.wordpress.com/586/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=586&subd=attractivechaos&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/047ebc7bb9ff37a0da844413856e92cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">attractivechaos</media:title>
		</media:content>

		<media:content url="http://klib.sourceforge.net/images/udb-int-cpu.png" medium="image">
			<media:title type="html">udb-int-cpu</media:title>
		</media:content>

		<media:content url="http://klib.sourceforge.net/images/udb-int-mem.png" medium="image">
			<media:title type="html">udb-int-mem</media:title>
		</media:content>

		<media:content url="http://klib.sourceforge.net/images/udb-str-cpu.png" medium="image">
			<media:title type="html">udb-str-cpu</media:title>
		</media:content>

		<media:content url="http://klib.sourceforge.net/images/udb-str-mem.png" medium="image">
			<media:title type="html">udb-str-mem</media:title>
		</media:content>
	</item>
		<item>
		<title>Is There an Overhead to Retrieve an Element in a Struct?</title>
		<link>http://attractivechaos.wordpress.com/2008/10/01/is-there-an-overhead-to-retrieve-an-element-in-a-struct/</link>
		<comments>http://attractivechaos.wordpress.com/2008/10/01/is-there-an-overhead-to-retrieve-an-element-in-a-struct/#comments</comments>
		<pubDate>Wed, 01 Oct 2008 20:47:33 +0000</pubDate>
		<dc:creator>attractivechaos</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[C]]></category>

		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=541</guid>
		<description><![CDATA[I was wondering whether retrieving an element in a struct will incur additional overhead. And so I did the following experiment. Here the same array is sorted in two ways: with or without data retrieving from a struct. Both ways yield identical results. The question is whether the compiler knows the two ways are the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=541&subd=attractivechaos&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I was wondering whether retrieving an element in a struct will incur additional overhead. And so I did the following experiment. Here the same array is sorted in two ways: with or without data retrieving from a struct. Both ways yield identical results. The question is whether the compiler knows the two ways are the same and can achieve the same efficiency.</p>
<p>#include <time.h><br />
#include <stdlib.h><br />
#include <stdio.h><br />
#include &#8220;ksort.h&#8221;</p>
<p>typedef struct {<br />
	int a;<br />
} myint_t;</p>
<p>#define myint_lt(_a, _b) ((_a).a < (_b).a)</p>
<p>KSORT_INIT_GENERIC(int)<br />
KSORT_INIT(my, myint_t, myint_lt)</p>
<p>int main()<br />
{<br />
	int i, N = 10000000;<br />
	myint_t *a;<br />
	clock_t t;<br />
	a = (myint_t*)malloc(sizeof(myint_t) * N);<br />
	srand48(11);<br />
	for (i = 0; i != N; ++i) a[i].a = lrand48();<br />
	t = clock();<br />
	ks_introsort(int, N, (int*)a);<br />
	printf(&#8220;%.3lf\n&#8221;, (double)(clock() &#8211; t) / CLOCKS_PER_SEC);<br />
	srand48(11);<br />
	for (i = 0; i != N; ++i) a[i].a = lrand48();<br />
	t = clock();<br />
	ks_introsort(my, N, a);<br />
	printf(&#8220;%.3lf\n&#8221;, (double)(clock() &#8211; t) / CLOCKS_PER_SEC);<br />
	free(a);<br />
	return 0;<br />
}</p>
<p>Here is the speed with different compilers on different CPUs (first value for without data retrieving and second with):</p>
<ul>
<li>Mac-Intel, gcc-4.0, -O2: 1.422 sec vs. 1.802 sec</li>
<li>Mac-Intel, gcc-4.2, -O2: 1.438 vs. 1.567</li>
<li>Mac-Intel, gcc-4.0, -O2 -fomit-frame-pointer: 1.425 vs. 1.675</li>
<li>Mac-Intel, gcc-4.2, -O2 -fomit-frame-pointer: 1.438 vs. 1.448</li>
<li>Linux-Intel, gcc-4.1, -O2: 1.600 vs. 1.520</li>
<li>Linux-Intel, gcc-4.1, -O2 -fomit-frame-pointer: 1.620 vs. 1.530</li>
<li>Linux-Intel, icc, -O2 -fomit-frame-pointer: 1.600 vs. 1.580</li>
</ul>
<p>The conclusion is retrieving data from a struct may have marginal overhead in comprison to direct data access. However, a good compiler can avoid this and produce nearly optimal machine code. Using &#8220;-fomit-frame-pointer&#8221; may help for some machines, but not for others. In addition, it is a bit surprising to me that gcc-linux generates faster code for data retrieval in a struct. Swapping the two ways does not change the conclusion.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/attractivechaos.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/attractivechaos.wordpress.com/541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/attractivechaos.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/attractivechaos.wordpress.com/541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/attractivechaos.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/attractivechaos.wordpress.com/541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/attractivechaos.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/attractivechaos.wordpress.com/541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/attractivechaos.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/attractivechaos.wordpress.com/541/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=541&subd=attractivechaos&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://attractivechaos.wordpress.com/2008/10/01/is-there-an-overhead-to-retrieve-an-element-in-a-struct/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/047ebc7bb9ff37a0da844413856e92cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">attractivechaos</media:title>
		</media:content>
	</item>
		<item>
		<title>Futher Discussion on Search Trees</title>
		<link>http://attractivechaos.wordpress.com/2008/09/28/futher-discussion-on-search-trees/</link>
		<comments>http://attractivechaos.wordpress.com/2008/09/28/futher-discussion-on-search-trees/#comments</comments>
		<pubDate>Sun, 28 Sep 2008 21:23:28 +0000</pubDate>
		<dc:creator>attractivechaos</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[cpp]]></category>
		<category><![CDATA[myprog]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=535</guid>
		<description><![CDATA[Over the weekend, I have done a more comprehensive benchmark of various libraries on search trees. Two AVL, seven red-black tree, one Splay tree, two treap implementations are involved, together with seven hash table libraries. As I need to present a big table, I have to write it in a free-style HTML page. You can [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=535&subd=attractivechaos&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Over the weekend, I have done a more comprehensive benchmark of various libraries on search trees. Two AVL, seven red-black tree, one Splay tree, two treap implementations are involved, together with seven hash table libraries. As I need to present a big table, I have to write it in a free-style HTML page. You can find the complete benchmark <a href="http://attractivechaos.awardspace.com/udb.html">here</a> and all the source codes <a href="http://attractivechaos.awardspace.com/download/udb-20080928.tar.bz2">here</a>. I only copy the &#8220;concluding remarks&#8221; in the benchmark page as follows:</p>
<ul>
<li>Hash table is preferred over search trees if we do not require order.</li>
<li>In applications similar to my example, B-tree is better than most of binary search trees in terms of both speed and memory.</li>
<li>AVL tree and red-black tree are the best general-purposed BSTs. They are very close in efficiency.</li>
<li>For pure C libraries, using macros is usually more efficient than using void* to achieve generic programming.</li>
</ul>
<p>You can find the result and much more discussions in <a href="http://attractivechaos.awardspace.com/udb.html">that page</a>. If you think the source codes or the design of benchmark can be improved, please leave comments here or send me E-mail. In addition, I failed to use several libraries and so you can see some blank in the table. I would also appreciate if someone could show me how to use those libraries correctly.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/attractivechaos.wordpress.com/535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/attractivechaos.wordpress.com/535/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/attractivechaos.wordpress.com/535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/attractivechaos.wordpress.com/535/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/attractivechaos.wordpress.com/535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/attractivechaos.wordpress.com/535/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/attractivechaos.wordpress.com/535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/attractivechaos.wordpress.com/535/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/attractivechaos.wordpress.com/535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/attractivechaos.wordpress.com/535/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=535&subd=attractivechaos&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://attractivechaos.wordpress.com/2008/09/28/futher-discussion-on-search-trees/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/047ebc7bb9ff37a0da844413856e92cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">attractivechaos</media:title>
		</media:content>
	</item>
		<item>
		<title>B-tree vs. Binary Search Tree</title>
		<link>http://attractivechaos.wordpress.com/2008/09/24/b-tree-vs-binary-search-tree/</link>
		<comments>http://attractivechaos.wordpress.com/2008/09/24/b-tree-vs-binary-search-tree/#comments</comments>
		<pubDate>Wed, 24 Sep 2008 22:38:45 +0000</pubDate>
		<dc:creator>attractivechaos</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[myprog]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=507</guid>
		<description><![CDATA[When talking about in-memory search tree, we usually think of various binary search trees: red-black tree, AVL tree, treap, splay tree and so on. We do not often think of B-tree, as B-tree is commonly introduced as an on-disk data structure rather than in-memory one. Is B-tree also a good data structure for in-memory ordered [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=507&subd=attractivechaos&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>When talking about in-memory search tree, we usually think of various binary search trees: red-black tree, AVL tree, treap, splay tree and so on. We do not often think of B-tree, as B-tree is commonly introduced as an on-disk data structure rather than in-memory one. Is B-tree also a good data structure for in-memory ordered dictionary? I used to search for the performance comparison between B-tree and binary search trees, but ended up with nothing useful. It seems that only I am interested in such comparison and so I have to do it by myself.</p>
<p>I found John-Mark Gurney’s <a href="http://resnet.uoregon.edu/~gurney_j/jmpc/btree.html">B-tree</a> via google search. It is well coded and full of clever ideas. The original version has small memory footprint, but it is not as fast as STL&#8217;s red-black tree. I studied this source codes and think I should be able to further optimize it. In the end, I got my kbtree.h macro library. As you can see in my hash table benchmark, the modified version beats STL set while using even smaller memory than the original version. I think I am now at the position to say: at least for some applications, B-tree is a better ordered data structure than most of binary search trees.</p>
<p>The most attractive feature of B-tree is its small memory usage. A binary tree needs at least two pointers for each record, which amounts to 16N on a modern 64-bit systems. A B-tree only needs one pointer. Although in a B-tree each node may not be full, a sufficiently large B-tree should be at least 50% full by definition and in average around 75% full. On a 64-bit system, the extra memory is only 8N/0.75+KN(1/0.75-1)=(10+0.3K)N, where K is the size of a key. In fact we can do even better as we do not need to allocate the null pointers in leaves. The practical memory overhead can be reduced to below (5+0.3K)N (in fact, as the majority of nodes in a B-tree are leaves, the factor 5 should be smaller in practice), far better than a binary search tree. On speed, no binary search tree with just two additional pointers (splay tree and hash treap) can achieve the best performance. We usually need additional information at each node (AVL tree and standard red-black tree) or a random number (treap) to get good performance. B-tree is different. It is even faster than the standard red-black tree while still using (5+0.3K)N extra memory! People should definitely pay more attention to B-tree.</p>
<p><strong>Update:</strong> The modified B-tree is available <a href="http://www.freewebs.com/attractivechaos/kbtree.h">here</a> (<a href="http://www.freewebs.com/attractivechaos/kbtree.h.html">HTML</a>) as a single C header file. Example is <a href="http://www.freewebs.com/attractivechaos/kbtest.c">here</a>. Currently, the APIs are not very friendly but are ready to use. In case you want to give a try. Note that you must make sure the key is absent in the tree before kb_put() and make sure the key is present in the tree before calling kb_del().</p>
<p>Someone has corrected me. STL is a specification, not an implementation. By STL in my blog, I always mean SGI STL, the default STL that comes with GCC.</p>
<p>Over the weekend I have done a more complete benchmark of various libraries on search trees and hash tables. Please read <a href="http://attractivechaos.wordpress.com/2008/09/28/futher-discussion-on-search-trees/">this post</a> if you are interested.</p>
<p>I realize that a lot of red-black tree implementations do not need a parent pointer, although SGI STL&#8217;s one uses. My comment below is somewhat wrong.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/attractivechaos.wordpress.com/507/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/attractivechaos.wordpress.com/507/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/attractivechaos.wordpress.com/507/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/attractivechaos.wordpress.com/507/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/attractivechaos.wordpress.com/507/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/attractivechaos.wordpress.com/507/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/attractivechaos.wordpress.com/507/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/attractivechaos.wordpress.com/507/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/attractivechaos.wordpress.com/507/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/attractivechaos.wordpress.com/507/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=507&subd=attractivechaos&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://attractivechaos.wordpress.com/2008/09/24/b-tree-vs-binary-search-tree/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/047ebc7bb9ff37a0da844413856e92cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">attractivechaos</media:title>
		</media:content>
	</item>
		<item>
		<title>A Simple Generic Vector Container in C</title>
		<link>http://attractivechaos.wordpress.com/2008/09/22/a-simple-vector-macro-library-in-c/</link>
		<comments>http://attractivechaos.wordpress.com/2008/09/22/a-simple-vector-macro-library-in-c/#comments</comments>
		<pubDate>Mon, 22 Sep 2008 08:28:10 +0000</pubDate>
		<dc:creator>attractivechaos</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[macro]]></category>
		<category><![CDATA[myprog]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=484</guid>
		<description><![CDATA[I do not see much need to have a vector container in C as a vector is simply an array and array operations are all very simple. Nontheless, it might still better to implement one, for the sake of completeness. Here is the code. The library is almost as fast as the fastest code you [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=484&subd=attractivechaos&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I do not see much need to have a vector container in C as a vector is simply an array and array operations are all very simple. Nontheless, it might still better to implement one, for the sake of completeness. Here is the code. The library is almost as fast as the fastest code you can write in C.</p>
<pre class="brush: cpp;">
#ifndef AC_KVEC_H
#define AC_KVEC_H

#include &lt;stdint.h&gt;
#include &lt;stdlib.h&gt;

#define kv_roundup32(x) (--(x), (x)|=(x)&gt;&gt;1, (x)|=(x)&gt;&gt;2, (x)|=(x)&gt;&gt;4, (x)|=(x)&gt;&gt;8, (x)|=(x)&gt;&gt;16, ++(x))

#define kvec_t(type) struct { uint32_t n, m; type *a; }
#define kv_init(v) ((v).n = (v).m = 0, (v).a = 0)
#define kv_destroy(v) free((v).a)
#define kv_A(v, i) ((v).a[(i)])
#define kv_pop(v) ((v).a[--(v).n])
#define kv_size(v) ((v).n)
#define kv_max(v) ((v).m)

#define kv_resize(type, v, s)  ((v).m = (s), (v).a = (type*)realloc((v).a, sizeof(type) * (v).m))

#define kv_push(type, v, x) do {									\
		if ((v).n == (v).m) {										\
			(v).m = (v).m? (v).m&lt;&lt;1 : 2;							\
			(v).a = (type*)realloc((v).a, sizeof(type) * (v).m);	\
		}															\
		(v).a[(v).n++] = (x);										\
	} while (0)

#define kv_pushp(type, v) (((v).n == (v).m)?							\
						   ((v).m = ((v).m? (v).m&lt;&lt;1 : 2),				\
							(v).a = (type*)realloc((v).a, sizeof(type) * (v).m), 0)	\
						   : 0), ((v).a + ((v).n++))

#define kv_a(type, v, i) ((v).m &lt;= (i)?									\
						  ((v).m = (v).n = (i) + 1, kv_roundup32((v).m), \
						   (v).a = (type*)realloc((v).a, sizeof(type) * (v).m), 0) \
						  : (v).n &lt;= (i)? (v).n = (i)					\
						  : 0), (v).a[(i)]
#endif
</pre>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/attractivechaos.wordpress.com/484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/attractivechaos.wordpress.com/484/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/attractivechaos.wordpress.com/484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/attractivechaos.wordpress.com/484/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/attractivechaos.wordpress.com/484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/attractivechaos.wordpress.com/484/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/attractivechaos.wordpress.com/484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/attractivechaos.wordpress.com/484/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/attractivechaos.wordpress.com/484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/attractivechaos.wordpress.com/484/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=484&subd=attractivechaos&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://attractivechaos.wordpress.com/2008/09/22/a-simple-vector-macro-library-in-c/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/047ebc7bb9ff37a0da844413856e92cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">attractivechaos</media:title>
		</media:content>
	</item>
		<item>
		<title>Thoughts on Generic Programming in C</title>
		<link>http://attractivechaos.wordpress.com/2008/09/21/thoughts-on-generic-programming-in-c/</link>
		<comments>http://attractivechaos.wordpress.com/2008/09/21/thoughts-on-generic-programming-in-c/#comments</comments>
		<pubDate>Sun, 21 Sep 2008 16:10:36 +0000</pubDate>
		<dc:creator>attractivechaos</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[cpp]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[thinking]]></category>

		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=479</guid>
		<description><![CDATA[I came across two interviews (here and here) of Alexander Stepanov, the father of STL. There are quite a lot of interesting bits. For example, he thinks C++ is the best programming language to realize his goal, but he is also strongly against OOP at the same time. In addition, he has paid a lot [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=479&subd=attractivechaos&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I came across two interviews (<a href="http://www.stlport.org/resources/StepanovUSA.html">here</a> and <a href="http://www.stepanovpapers.com/drdobbs-interview.html">here</a>) of <a href="http://en.wikipedia.org/wiki/Alexander_Stepanov">Alexander Stepanov</a>, the father of STL. There are quite a lot of interesting bits. For example, he thinks C++ is the best programming language to realize his goal, but he is also strongly against OOP at the same time. In addition, he has paid a lot of efforts on efficiency, which we can see from STL. He said: &#8220;It is silly to abstract an algorithm in such a way that when you instantiate it back it becomes inefficient&#8221;. I like these two interviews because I think in the same way. The only exception is I do not use STL, although I think it is the best generic library and I like it a lot. But why?</p>
<p>Two reasons. Firstly, STL is written in C++, which makes it unavailable to all C projects. It is possible to only use STL and forget all the other features in C++, but people rarely do so. At least I have not seen such a project where STL is combined with procedural programming. In addition, C++ projects are usually less portable than C projects and STL makes it worse. It puts a lot of stress on C++ compilers. Even Stepanov agreeed, by the time of the interview, that &#8220;The unfortunate reality is that a lot of code in the present implementation of STL is suboptimal because of the compiler limitations and bugs of the compilers I had to use when I was developing STL&#8221;. Secondly, using STL also means much longer compiling time. I remembered I used to compile a customized Linux kernel for my old laptop in an hour. Probably I would spend more than a day to compile if it was written using C++/STL.</p>
<p>A generic container library would benefit a lot of C programmers, but so far I am not aware of any efficient implementation. Glib tries to achieve so, but it uses void* and this inevitably will incur overhead and complicate interfaces. And finally, I decide to write my own one. Ideally (but probably impractically) I want to achieve four goals: a) efficiency in speed and space; b) elegance in interface; c) independency between functinality and d) simplicity in codes. However, currently I am not competent enough to achieve all these goals and I am not a professional programmer at all (and so cannot invest enough time). As I said in my About page, I mainly do this to please myself.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/attractivechaos.wordpress.com/479/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/attractivechaos.wordpress.com/479/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/attractivechaos.wordpress.com/479/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/attractivechaos.wordpress.com/479/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/attractivechaos.wordpress.com/479/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/attractivechaos.wordpress.com/479/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/attractivechaos.wordpress.com/479/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/attractivechaos.wordpress.com/479/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/attractivechaos.wordpress.com/479/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/attractivechaos.wordpress.com/479/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=479&subd=attractivechaos&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://attractivechaos.wordpress.com/2008/09/21/thoughts-on-generic-programming-in-c/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/047ebc7bb9ff37a0da844413856e92cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">attractivechaos</media:title>
		</media:content>
	</item>
		<item>
		<title>C Array vs. C++ Vector</title>
		<link>http://attractivechaos.wordpress.com/2008/09/19/c-array-vs-c-vector/</link>
		<comments>http://attractivechaos.wordpress.com/2008/09/19/c-array-vs-c-vector/#comments</comments>
		<pubDate>Fri, 19 Sep 2008 08:03:34 +0000</pubDate>
		<dc:creator>attractivechaos</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[cpp]]></category>

		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=456</guid>
		<description><![CDATA[Here is a piece of source codes that compare C arrays and C++ vectors. It tests six scenarios: a) preallocated C array; b) dynamically growing C array; c) dynamical C vector calling kv_a macro (in my kvec.h); d) dynamical C vector calling kv_push macro (in my kvec.h); e) preallocated C++ vector and f) dynamically growing C++ [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=456&subd=attractivechaos&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Here is a piece of source codes that compare C arrays and C++ vectors. It tests six scenarios: a) preallocated C array; b) dynamically growing C array; c) dynamical C vector calling kv_a macro (in my kvec.h); d) dynamical C vector calling kv_push macro (in my kvec.h); e) preallocated C++ vector and f) dynamically growing C++ vector. You can find my kvec.h on my blog.</p>
<pre class="brush: cpp;">
#include &lt;vector&gt;
#include &lt;time.h&gt;
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &quot;kvec.h&quot;

int main()
{
	int M = 10, N = 20000000, i, j;
	clock_t t;
	t = clock();
	for (i = 0; i &lt; M; ++i) {
		int *array = (int*)malloc(N * sizeof(int));
		for (j = 0; j &lt; N; ++j) array[j] = j;
		free(array);
	}
	printf(&quot;C array, preallocated: %.3f sec\n&quot;,
		   (float)(clock() - t) / CLOCKS_PER_SEC);
	t = clock();
	for (i = 0; i &lt; M; ++i) {
		int *array = 0, max = 0;
		for (j = 0; j &lt; N; ++j) {
			if (j == max) {
				max = !max? 1 : max &lt;&lt; 1;
				array = (int*)realloc(array, sizeof(int)*max);
			}
			array[j] = j;
		}
		free(array);
	}
	printf(&quot;C array, dynamic: %.3f sec\n&quot;,
		   (float)(clock() - t) / CLOCKS_PER_SEC);
	t = clock();
	for (i = 0; i &lt; M; ++i) {
		kvec_t(int) array;
		kv_init(array);
		kv_resize(int, array, N);
		for (j = 0; j &lt; N; ++j) kv_a(int, array, j) = j;
		kv_destroy(array);
	}
	printf(&quot;C vector, dynamic (kv_a): %.3f sec\n&quot;,
		   (float)(clock() - t) / CLOCKS_PER_SEC);
	t = clock();
	for (i = 0; i &lt; M; ++i) {
		kvec_t(int) array;
		kv_init(array);
		for (j = 0; j &lt; N; ++j)
			kv_push(int, array, j);
		kv_destroy(array);
	}
	printf(&quot;C vector, dynamic (kv_push): %.3f sec\n&quot;,
		   (float)(clock() - t) / CLOCKS_PER_SEC);
	t = clock();
	for (i = 0; i &lt; M; ++i) {
		std::vector&lt;int&gt; array;
		array.reserve(N);
		for (j = 0; j &lt; N; ++j) array[j] = j;
	}
	printf(&quot;C++ vector, preallocated: %.3f sec\n&quot;,
		   (float)(clock() - t) / CLOCKS_PER_SEC);
	t = clock();
	for (i = 0; i &lt; M; ++i) {
		std::vector&lt;int&gt; array;
		for (j = 0; j &lt; N; ++j) array.push_back(j);
	}
	printf(&quot;C++ vector, dynamic: %.3f sec\n&quot;,
		   (float)(clock() - t) / CLOCKS_PER_SEC);
	return 0;
}
</pre>
<p>Here is the result on two machines (compiled with g++ -O2 -fomit-frame-pointer -finline-functions):</p>
<table border="1" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td>type</td>
<td>MacIntel</td>
<td>LinuxIntel</td>
</tr>
<tr>
<td>C array, preallocated</td>
<td>1.589</td>
<td>1.180</td>
</tr>
<tr>
<td>C array, dynamic</td>
<td>2.064</td>
<td>1.340</td>
</tr>
<tr>
<td>C vector, dynamic (kv_a)</td>
<td>2.051</td>
<td>1.600</td>
</tr>
<tr>
<td>C vector, dynamic (kv_push)</td>
<td>1.932</td>
<td>1.250</td>
</tr>
<tr>
<td>C++ vector, preallocated</td>
<td>2.119</td>
<td>1.590</td>
</tr>
<tr>
<td>C++ vector, dynamic</td>
<td>5.095</td>
<td>3.770</td>
</tr>
</tbody>
</table>
<p>Such result may vary with different machines/compilers, but not much.</p>
<p><strong>Update:</strong> My example passed valgrind check on a Linux (Debian etch, g++-4.1), but Tom_ pointed out that it did not pass VC++&#8217;s debugger as vector::operator[] writes outside vector::size(). Anyway, it is good not to use operator[] in my way. You can replace reserve()+[] with resize()+[] or reserve()+push_back(). On my machine, the replacement gives a little bit slower speed, but it is safer/more portable in that way. Thanks for all the comments.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/attractivechaos.wordpress.com/456/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/attractivechaos.wordpress.com/456/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/attractivechaos.wordpress.com/456/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/attractivechaos.wordpress.com/456/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/attractivechaos.wordpress.com/456/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/attractivechaos.wordpress.com/456/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/attractivechaos.wordpress.com/456/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/attractivechaos.wordpress.com/456/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/attractivechaos.wordpress.com/456/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/attractivechaos.wordpress.com/456/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=456&subd=attractivechaos&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://attractivechaos.wordpress.com/2008/09/19/c-array-vs-c-vector/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/047ebc7bb9ff37a0da844413856e92cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">attractivechaos</media:title>
		</media:content>
	</item>
		<item>
		<title>Calculating Median</title>
		<link>http://attractivechaos.wordpress.com/2008/09/13/calculating-median/</link>
		<comments>http://attractivechaos.wordpress.com/2008/09/13/calculating-median/#comments</comments>
		<pubDate>Sat, 13 Sep 2008 19:26:20 +0000</pubDate>
		<dc:creator>attractivechaos</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[myprog]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=440</guid>
		<description><![CDATA[Here is an example that google does not give me the result in the first page. I want to know how to calculate median efficiently, and so I search &#8220;c calculate median&#8221;. In the first result page, google brings me to several forums which only show very naive implementations. The 11th result, this page, is [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=440&subd=attractivechaos&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Here is an example that google does not give me the result in the first page. I want to know how to calculate median efficiently, and so I search &#8220;c calculate median&#8221;. In the first result page, google brings me to several forums which only show very naive implementations. The 11th result, <a href="http://ndevilla.free.fr/median/index.html">this page</a>, is the truely invaluable one which should be favoured by most programmers. I do not want to replicate that website. I just want to show you a function that is adapted from <a href="http://ndevilla.free.fr/median/median/src/quickselect.c">quickselect.c</a> on the website. This function calculates the k-smallest (0&lt;=k&lt;n) element in an array. Its time complexity is linear to the size of the array and in practice it runs much faster than sorting and then locating the k-smallest element.</p>
<pre class="brush: cpp;">
type_t ks_ksmall(size_t n, type_t arr[], size_t kk)
{
	type_t *low, *high, *k, *ll, *hh, *middle;
	low = arr; high = arr + n - 1; k = arr + kk;
	for (;;) {
		if (high &lt;= low) return *k;
		if (high == low + 1) {
			if (cmp(*high, *low)) swap(type_t, *low, *high);
			return *k;
		}
		middle = low + (high - low) / 2;
		if (lt(*high, *middle)) swap(type_t, *middle, *high);
		if (lt(*high, *low)) swap(type_t, *low, *high);
		if (lt(*low, *middle)) swap(type_t, *middle, *low);
		swap(type_t, *middle, *(low+1)) ;
		ll = low + 1; hh = high;
		for (;;) {
			do ++ll; while (lt(*ll, *low));
			do --hh; while (lt(*low, *hh));
			if (hh &lt; ll) break;
			swap(type_t, *ll, *hh);
		}
		swap(type_t, *low, *hh);
		if (hh &lt;= k) low = ll;
		if (hh &gt;= k) high = hh - 1;
	}
}
</pre>
<p>In this funcion, type_t is a type, swap() swaps two values, and lt() is a macro or a function that returns true if and only if the first value is smaller.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/attractivechaos.wordpress.com/440/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/attractivechaos.wordpress.com/440/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/attractivechaos.wordpress.com/440/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/attractivechaos.wordpress.com/440/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/attractivechaos.wordpress.com/440/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/attractivechaos.wordpress.com/440/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/attractivechaos.wordpress.com/440/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/attractivechaos.wordpress.com/440/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/attractivechaos.wordpress.com/440/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/attractivechaos.wordpress.com/440/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/attractivechaos.wordpress.com/440/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/attractivechaos.wordpress.com/440/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=440&subd=attractivechaos&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://attractivechaos.wordpress.com/2008/09/13/calculating-median/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/047ebc7bb9ff37a0da844413856e92cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">attractivechaos</media:title>
		</media:content>
	</item>
		<item>
		<title>Implementing Generic Hash Library in C</title>
		<link>http://attractivechaos.wordpress.com/2008/09/02/implementing-generic-hash-library-in-c/</link>
		<comments>http://attractivechaos.wordpress.com/2008/09/02/implementing-generic-hash-library-in-c/#comments</comments>
		<pubDate>Tue, 02 Sep 2008 20:18:32 +0000</pubDate>
		<dc:creator>attractivechaos</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[hash]]></category>
		<category><![CDATA[myprog]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=358</guid>
		<description><![CDATA[Synopsis
Here is an simple example showing how to use khash.h library:

#include &#34;khash.h&#34;
KHASH_MAP_INIT_INT(32, char)
int main() {
	int ret, is_missing;
	khiter_t k;
	khash_t(32) *h = kh_init(32);
	k = kh_put(32, h, 5, &#38;ret);
	if (!ret) kh_del(32, h, k);
	kh_value(h, k) = 10;
	k = kh_get(32, h, 10);
	is_missing = (k == kh_end(h));
	k = kh_get(32, h, 5);
	kh_del(32, h, k);
	for (k = kh_begin(h); k != kh_end(h); ++k)
		if (kh_exist(h, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=358&subd=attractivechaos&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p><strong>Synopsis</strong></p>
<p>Here is an simple example showing how to use <a href="http://www.freewebs.com/attractivechaos/khash.h.html">khash.h</a> library:</p>
<pre class="brush: cpp;">
#include &quot;khash.h&quot;
KHASH_MAP_INIT_INT(32, char)
int main() {
	int ret, is_missing;
	khiter_t k;
	khash_t(32) *h = kh_init(32);
	k = kh_put(32, h, 5, &amp;ret);
	if (!ret) kh_del(32, h, k);
	kh_value(h, k) = 10;
	k = kh_get(32, h, 10);
	is_missing = (k == kh_end(h));
	k = kh_get(32, h, 5);
	kh_del(32, h, k);
	for (k = kh_begin(h); k != kh_end(h); ++k)
		if (kh_exist(h, k)) kh_value(h, k) = 1;
	kh_destroy(32, h);
	return 0;
}
</pre>
<p>The second line says we want to use a hash map with int as key and char as value. khash_t(int) is a type. kh_get() and kh_put() returns an iterator, or the position in the hash table. kh_del() erases the key-value in the bucket pointed by the iterator. kh_begin() and kh_end() return the begin and the end of iterator, respectively. And kh_exist() tests whether the bucket at the iterator is filled with a key-value. The APIs are not so concise in comparison to C++ template, but are very straightforward and flexible. How can this be done?</p>
<p><strong>Achieving generic programming in C</strong></p>
<p>The core part of khash.h is:</p>
<pre class="brush: cpp;">#define KH_INIT(name, key_t, val_t, is_map, _hashf, _hasheq) \
  typedef struct { \
    int n_buckets, size, n_occupied, upper_bound; \
    unsigned *flags; \
    key_t *keys; \
    val_t *vals; \
  } kh_##name##_t; \
  static inline kh_##name##_t *init_##name() { \
    return (kh_##name##_t*)calloc(1, sizeof(kh_##name##_t)); \
  } \
  static inline int get_##name(kh_##name##_t *h, key_t k) \
  ... \
  static inline void destroy_##name(kh_##name##_t *h) { \
    if (h) { \
      free(h-&gt;keys); free(h-&gt;flags); free(h-&gt;vals); free(h); \
    } \
  }

#define _int_hf(key) (unsigned)(key)
#define _int_heq(a, b) (a == b)
#define khash_t(name) kh_##name##_t
#define kh_init(name) init_##name()
#define kh_get(name, h, k) get_##name(h, k)
#define kh_destroy(name, h) destroy_##name(h)
...
#define KHASH_MAP_INIT_INT(name, val_t) \
	KH_INIT(name, unsigned, val_t, is_map, _int_hf, _int_heq)
</pre>
<p>In macro &#8216;KH_INIT&#8217;, name is a unique symbol that distinguishes hash tables of different types, key_t the type of key, val_t the type of value, is_map is 0 or 1 indicating whether to allocate memory for vals, _hashf is a hash function/macro and _hasheq the comparison function/macro. Macro &#8216;KHASH_MAP_INIT_INT&#8217; is a convenient interface to hash with integer keys.</p>
<p>When &#8216;KHASH_MAP_INIT_INT(32, char)&#8217; is used in a C source code file the following codes will be inserted:</p>
<pre class="brush: cpp;">
  typedef struct {
    int n_buckets, size, n_occupied, upper_bound;
    unsigned *flags;
    unsigned *keys;
    char *vals;
  } kh_int_t;
  static inline kh_int_t *init_int() {
    return (kh_int_t*)calloc(1, sizeof(kh_int_t));
  }
  static inline int get_int(kh_int_t *h, unsigned k)
  ...
  static inline void destroy_int(kh_int_t *h) {
    if (h) {
      free(h-&gt;keys); free(h-&gt;flags); free(h-&gt;vals); free(h);
    }
  }
</pre>
<p>And when we call: &#8216;kh_get(int, h, 5)&#8217;, we are calling &#8216;get_int(h, 5)&#8217; which is defined by calling KH_INIT(int) macro. In this way, we can effectively achieve generic programming with simple interfaces. As we use inline and macros throughout, the efficiency is not affected at all. In <a href="http://attractivechaos.wordpress.com/2008/08/28/comparison-of-hash-table-libraries/">my hash table benchmark</a>, it is as fast and light-weighted as the C++ implementation.</p>
<p><strong>Other technical concerns</strong></p>
<ul>
<li><strong>Solving collisions</strong>. I have discussed this in my previous post. I more like to achieve smaller memory and therefore I choose open addressing.</li>
<li><strong>Grouping key-value pairs or not</strong>. In the current implementation, keys and values are kept in separated arrays. This strategy will cause additional cache misses when keys and values are retrieved twice. Grouping key-value in a struct is more cache efficient. However, the good side of separating keys and values is this avoids waste of memory when key type and value type cannot be aligned well (e.g. key is an integer while value is a character). I would rather trade speed a bit for smaller memory. In addition, it is not hard to use a struct has a key in the current framework.</li>
<li><strong>Space efficient rehashing</strong>. Traditional rehashing requires to allocate one addition hash and move elements in the old hash to the new one. For most hash implementations, this means we need 50% extra working space to enlarge a hash. This is not necessary. In khash.h, only a new flags array is allocated on rehashing. Array keys and values are enlarged with realloc which does not claim more memory than the new hash. Keys and values are move from old positions to new positions in the same memory space. This strategy also helps to clear all buckets marked as deleted without changing the size of a hash.</li>
</ul>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/attractivechaos.wordpress.com/358/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/attractivechaos.wordpress.com/358/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/attractivechaos.wordpress.com/358/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/attractivechaos.wordpress.com/358/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/attractivechaos.wordpress.com/358/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/attractivechaos.wordpress.com/358/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/attractivechaos.wordpress.com/358/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/attractivechaos.wordpress.com/358/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/attractivechaos.wordpress.com/358/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/attractivechaos.wordpress.com/358/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/attractivechaos.wordpress.com/358/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/attractivechaos.wordpress.com/358/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=attractivechaos.wordpress.com&blog=4545823&post=358&subd=attractivechaos&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://attractivechaos.wordpress.com/2008/09/02/implementing-generic-hash-library-in-c/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/047ebc7bb9ff37a0da844413856e92cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">attractivechaos</media:title>
		</media:content>
	</item>
	</channel>
</rss>