<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Using void* in Generic C Programming may be Inefficient</title>
	<atom:link href="http://attractivechaos.wordpress.com/2008/10/02/using-void-in-generic-c-programming-may-be-inefficient/feed/" rel="self" type="application/rss+xml" />
	<link>http://attractivechaos.wordpress.com/2008/10/02/using-void-in-generic-c-programming-may-be-inefficient/</link>
	<description>Just another WordPress.com weblog</description>
	<lastBuildDate>Mon, 07 Dec 2009 19:12:37 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: azverkan</title>
		<link>http://attractivechaos.wordpress.com/2008/10/02/using-void-in-generic-c-programming-may-be-inefficient/#comment-159</link>
		<dc:creator>azverkan</dc:creator>
		<pubDate>Sun, 21 Dec 2008 22:43:47 +0000</pubDate>
		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=550#comment-159</guid>
		<description>Mike Acton wrote up a good article on strict aliasing on void* performance.

Unfortunately the domain name appears to have been hijacked but you can still read the article in the Internet Archive

http://web.archive.org/web/20071223232457/www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html


One thing I noticed with your khash.h/kbtree.h API is that it requires that the objects be allocated with calloc().  Another key for high performance in C is to minimize the amount of memory allocations that occur.  Right now your khash interface requires a specific calloc() be called for it.  The list types in Linux were designed to constructed in a parent structure such that you do not need to do a separate allocation for the list itself.  They also provide a macro called INIT_LIST_HEAD() that will let the compiler build a binary that has the list pre-initialized.

http://lxr.linux.no/linux+v2.6.27.10/include/linux/list.h#L19</description>
		<content:encoded><![CDATA[<p>Mike Acton wrote up a good article on strict aliasing on void* performance.</p>
<p>Unfortunately the domain name appears to have been hijacked but you can still read the article in the Internet Archive</p>
<p><a href="http://web.archive.org/web/20071223232457/www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html" rel="nofollow">http://web.archive.org/web/20071223232457/www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html</a></p>
<p>One thing I noticed with your khash.h/kbtree.h API is that it requires that the objects be allocated with calloc().  Another key for high performance in C is to minimize the amount of memory allocations that occur.  Right now your khash interface requires a specific calloc() be called for it.  The list types in Linux were designed to constructed in a parent structure such that you do not need to do a separate allocation for the list itself.  They also provide a macro called INIT_LIST_HEAD() that will let the compiler build a binary that has the list pre-initialized.</p>
<p><a href="http://lxr.linux.no/linux+v2.6.27.10/include/linux/list.h#L19" rel="nofollow">http://lxr.linux.no/linux+v2.6.27.10/include/linux/list.h#L19</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michiel</title>
		<link>http://attractivechaos.wordpress.com/2008/10/02/using-void-in-generic-c-programming-may-be-inefficient/#comment-158</link>
		<dc:creator>Michiel</dc:creator>
		<pubDate>Thu, 04 Dec 2008 09:30:31 +0000</pubDate>
		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=550#comment-158</guid>
		<description>The most likely reason why void* is slow (and C++ templates improve otherwise identical code) is that a void* may point anywhere. This means they can alias anything. This defeats many compiler optimizations. As a result, register values must be often be re-read after writing through a void*.</description>
		<content:encoded><![CDATA[<p>The most likely reason why void* is slow (and C++ templates improve otherwise identical code) is that a void* may point anywhere. This means they can alias anything. This defeats many compiler optimizations. As a result, register values must be often be re-read after writing through a void*.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: attractivechaos</title>
		<link>http://attractivechaos.wordpress.com/2008/10/02/using-void-in-generic-c-programming-may-be-inefficient/#comment-108</link>
		<dc:creator>attractivechaos</dc:creator>
		<pubDate>Fri, 03 Oct 2008 08:48:27 +0000</pubDate>
		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=550#comment-108</guid>
		<description>@Kevin

Sorry that I did not state it clearly. For the libavl example, we are looking at libavl_avl and libavl_avl_cpp in the table. The latter is implmented in C++ template. We can see that libavl_avl_cpp is 56% (=(1/6.73-1/10.51)/(1/10.51)) faster than libavl_avl on MacIntel with integer as keys. On Linux-Intel the difference is less (31% faster on integer keys or 15% on char* keys), but still significant. As for memory, libavl_avl_cpp does not save resident memory on Mac probably because memory alignment. On a 64-bit Linux, it saves 39% (=(48.41-29.41)/48.41) of resident memory.

The comparison of libc&#039;s qsort and STL&#039;s sort are showed in another post. My implementation of introsort (essentially quicksort) is 110% (=(1/7.887-1/16.579)/(1/16.579)) faster than libc&#039;s qsort on MacIntel, or 298% faster on LinuxAMD. The speedup comes at two points: saving a function call on comparisons and avoiding memcpy(). On Mac, it seems that most of the speedup comes from the first point, but on Linux most comes from the second point.

@Facepalm

I have showed the link to the table, as well as the design of the experiment. The table is too large to fit here and so I posted on another website. Sorry that that table is not very clear. Please see my reply to Kevin for more information.

Template provides more flexibility. We can always put void* in template when necessary, but we have to live with the inefficiency of some C libraries even if we do not mean to. In many practical applications, we know the type of data in a container. We do not need the runtime flexibility in common cases, I think.

I have also tried to put void* in SGI&#039;s std::set. On integer keys (MacIntel), std::set finishes in 12.738 sec (vs. 5.99 in my benchmark), using 29.80 MB memory (vs. 19.87). This is another evidence that using void* can be very inefficient.</description>
		<content:encoded><![CDATA[<p>@Kevin</p>
<p>Sorry that I did not state it clearly. For the libavl example, we are looking at libavl_avl and libavl_avl_cpp in the table. The latter is implmented in C++ template. We can see that libavl_avl_cpp is 56% (=(1/6.73-1/10.51)/(1/10.51)) faster than libavl_avl on MacIntel with integer as keys. On Linux-Intel the difference is less (31% faster on integer keys or 15% on char* keys), but still significant. As for memory, libavl_avl_cpp does not save resident memory on Mac probably because memory alignment. On a 64-bit Linux, it saves 39% (=(48.41-29.41)/48.41) of resident memory.</p>
<p>The comparison of libc&#8217;s qsort and STL&#8217;s sort are showed in another post. My implementation of introsort (essentially quicksort) is 110% (=(1/7.887-1/16.579)/(1/16.579)) faster than libc&#8217;s qsort on MacIntel, or 298% faster on LinuxAMD. The speedup comes at two points: saving a function call on comparisons and avoiding memcpy(). On Mac, it seems that most of the speedup comes from the first point, but on Linux most comes from the second point.</p>
<p>@Facepalm</p>
<p>I have showed the link to the table, as well as the design of the experiment. The table is too large to fit here and so I posted on another website. Sorry that that table is not very clear. Please see my reply to Kevin for more information.</p>
<p>Template provides more flexibility. We can always put void* in template when necessary, but we have to live with the inefficiency of some C libraries even if we do not mean to. In many practical applications, we know the type of data in a container. We do not need the runtime flexibility in common cases, I think.</p>
<p>I have also tried to put void* in SGI&#8217;s std::set. On integer keys (MacIntel), std::set finishes in 12.738 sec (vs. 5.99 in my benchmark), using 29.80 MB memory (vs. 19.87). This is another evidence that using void* can be very inefficient.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Facepalm</title>
		<link>http://attractivechaos.wordpress.com/2008/10/02/using-void-in-generic-c-programming-may-be-inefficient/#comment-106</link>
		<dc:creator>Facepalm</dc:creator>
		<pubDate>Thu, 02 Oct 2008 23:19:47 +0000</pubDate>
		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=550#comment-106</guid>
		<description>How about some charts or at least a table?  You give no metrics or even any data.  How much faster?  What data did you use to test?  How much more lightweight?

Additionally, OF COURSE templates are going to be faster in this instance.  The C compiler cannot make assumptions at compile time about the content of void pointers.  The types are being bound at compile time in C++ templates so what you&#039;re really doing is reducing the runtime flexibility of the code.</description>
		<content:encoded><![CDATA[<p>How about some charts or at least a table?  You give no metrics or even any data.  How much faster?  What data did you use to test?  How much more lightweight?</p>
<p>Additionally, OF COURSE templates are going to be faster in this instance.  The C compiler cannot make assumptions at compile time about the content of void pointers.  The types are being bound at compile time in C++ templates so what you&#8217;re really doing is reducing the runtime flexibility of the code.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin</title>
		<link>http://attractivechaos.wordpress.com/2008/10/02/using-void-in-generic-c-programming-may-be-inefficient/#comment-105</link>
		<dc:creator>Kevin</dc:creator>
		<pubDate>Thu, 02 Oct 2008 23:06:58 +0000</pubDate>
		<guid isPermaLink="false">http://attractivechaos.wordpress.com/?p=550#comment-105</guid>
		<description>Hi,
I am really interested in seeing how big the performance difference is, but that table is really hard to parse. It feels like I am missing some previous post or something. If you broke that information out into several graphs (it looks like there is enough info there for several posts!) I am sure it would be quite popular. 

-K</description>
		<content:encoded><![CDATA[<p>Hi,<br />
I am really interested in seeing how big the performance difference is, but that table is really hard to parse. It feels like I am missing some previous post or something. If you broke that information out into several graphs (it looks like there is enough info there for several posts!) I am sure it would be quite popular. </p>
<p>-K</p>
]]></content:encoded>
	</item>
</channel>
</rss>
