Feeds:
Posts
Comments

In a project I want to directly open part of a remote file sitting on FTP or HTTP. I do not want to download the whole file because that file is frequently over 10GB in size. What I want to do is to have function calls with a similar interface to fopen/fclose/ftell/fseek/fread (I do not need fwrite for the moment). I can then open a remote file as if it is local. I did quite some google search for a suitable library, but most of related libraries are designed for file transfer. In the end, I decide to write my own library for this task. And here it is.

This library consists of one C header file and one C source file. It was originally developed for Linux/Mac and then ported to Windows, supporting the MinGW compiler. On Linux, the implemented features work properly. On Windows, however, random access to FTP files sometimes does not.

This library provides knet_open(), knet_close(), knet_tell(), knet_seek() and knet_read() function calls. You can manipulate a file on HTTP with, for example:

char buf[4096];
knetFile *fp = knet_open("http://host/file", "r");
knet_seek(fp, 1000, SEEK_SET);
knet_read(fp, buf, 4096);
knet_close(fp);

Opening FTP file is similar. This library is also transparent to local files, when the file name does not start with “http://” or “ftp://”.

This is my first attempt on network programming and surely a lot of things can be improved. Please leave me messages if you have any suggestions. Thanks in advance.

Here is the C header file:


#ifndef KNETFILE_H
#define KNETFILE_H

#include <stdint.h>
#include <fcntl.h>

#ifndef _WIN32
#define netread(fd, ptr, len) read(fd, ptr, len)
#define netwrite(fd, ptr, len) write(fd, ptr, len)
#define netclose(fd) close(fd)
#else
#include <winsock.h>
#define netread(fd, ptr, len) recv(fd, ptr, len, 0)
#define netwrite(fd, ptr, len) send(fd, ptr, len, 0)
#define netclose(fd) closesocket(fd)
#endif

// FIXME: currently I/O is unbuffered

#define KNF_TYPE_LOCAL 1
#define KNF_TYPE_FTP   2
#define KNF_TYPE_HTTP  3

typedef struct knetFile_s {
	int type, fd;
	int64_t offset;
	char *host, *port;

	// the following are for FTP only
	int ctrl_fd, pasv_ip[4], pasv_port, max_response, no_reconnect, is_ready;
	char *response, *retr;
	int64_t seek_offset; // for lazy seek

	// the following are for HTTP only
	char *path, *http_host;
} knetFile;

#define knet_tell(fp) ((fp)->offset)
#define knet_fileno(fp) ((fp)->fd)

#ifdef __cplusplus
extern "C" {
#endif

#ifdef _WIN32
	int knet_win32_init();
	void knet_win32_destroy();
#endif

	knetFile *knet_open(const char *fn, const char *mode);

	/*
	   This only works with local files.
	 */
	knetFile *knet_dopen(int fd, const char *mode);

	/*
	  If ->is_ready==0, this routine updates ->fd; otherwise, it simply
	  reads from ->fd.
	 */
	off_t knet_read(knetFile *fp, void *buf, off_t len);

	/*
	  This routine only sets ->offset and ->is_ready=0. It does not
	  communicate with the FTP server.
	 */
	int knet_seek(knetFile *fp, off_t off, int whence);
	int knet_close(knetFile *fp);

#ifdef __cplusplus
}
#endif

#endif

Here is the main C source code:

/* Probably I will not do socket programming in the next few years and
   therefore I decide to heavily annotate this file, for Linux and
   Windows as well. */

#include <time.h>
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>

#ifdef _WIN32
#include <winsock.h>
#else
#include <netdb.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#endif

#include "knetfile.h"

/* In winsock.h, the type of a socket is SOCKET, which is: "typedef
 * u_int SOCKET". An invalid SOCKET is: "(SOCKET)(~0)", or signed
 * integer -1. In knetfile.c, I use "int" for socket type
 * throughout. This should be improved to avoid confusion.
 *
 * In Linux/Mac, recv() and read() do almost the same thing. You can see
 * in the header file that netread() is simply an alias of read(). In
 * Windows, however, they are different and using recv() is mandatory.
 */

/* This function tests if the file handler is ready for reading (or
 * writing if is_read==0). */
static int socket_wait(int fd, int is_read)
{
	fd_set fds, *fdr = 0, *fdw = 0;
	struct timeval tv;
	int ret;
	tv.tv_sec = 5; tv.tv_usec = 0; // 5 seconds time out
	FD_ZERO(&fds);
	FD_SET(fd, &fds);
	if (is_read) fdr = &fds;
	else fdw = &fds;
	ret = select(fd+1, fdr, fdw, 0, &tv);
	if (ret == -1) perror("select");
	return ret;
}

#ifndef _WIN32
/* This function does not work with Windows due to the lack of
 * getaddrinfo() in winsock. It is addapted from an example in "Beej's
 * Guide to Network Programming" (http://beej.us/guide/bgnet/). */
static int socket_connect(const char *host, const char *port)
{
#define __err_connect(func) do { perror(func); freeaddrinfo(res); return -1; } while (0)

	int on = 1, fd;
	struct linger lng = { 0, 0 };
	struct addrinfo hints, *res;
	memset(&hints, 0, sizeof(struct addrinfo));
	hints.ai_family = AF_UNSPEC;
	hints.ai_socktype = SOCK_STREAM;
	/* In Unix/Mac, getaddrinfo() is the most convenient way to get
	 * server information. */
	if (getaddrinfo(host, port, &hints, &res) != 0) __err_connect("getaddrinfo");
	if ((fd = socket(res->ai_family, res->ai_socktype, res->ai_protocol)) == -1) __err_connect("socket");
	/* The following two setsockopt() are used by ftplib
	 * (http://nbpfaus.net/~pfau/ftplib/). I am not sure if they
	 * necessary. */
	if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) == -1) __err_connect("setsockopt");
	if (setsockopt(fd, SOL_SOCKET, SO_LINGER, &lng, sizeof(lng)) == -1) __err_connect("setsockopt");
	if (connect(fd, res->ai_addr, res->ai_addrlen) != 0) __err_connect("connect");
	freeaddrinfo(res);
	return fd;
}
#else
/* In windows, the first thing is to establish the TCP connection. */
int knet_win32_init()
{
	WSADATA wsaData;
	return WSAStartup(MAKEWORD(2, 2), &wsaData);
}
void knet_win32_destroy()
{
	WSACleanup();
}
/* A slightly modfied version of the following function also works on
 * Mac (and presummably Linux). However, this function is not stable on
 * my Mac. It sometimes works fine but sometimes does not. Therefore for
 * non-Windows OS, I do not use this one. */
static SOCKET socket_connect(const char *host, const char *port)
{
#define __err_connect(func) do { perror(func); return -1; } while (0)

	int on = 1;
	SOCKET fd;
	struct linger lng = { 0, 0 };
	struct sockaddr_in server;
	struct hostent *hp = 0;
	// open socket
	if ((fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) == INVALID_SOCKET) __err_connect("socket");
	if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (char*)&on, sizeof(on)) == -1) __err_connect("setsockopt");
	if (setsockopt(fd, SOL_SOCKET, SO_LINGER, (char*)&lng, sizeof(lng)) == -1) __err_connect("setsockopt");
	// get host info
	if (isalpha(host[0])) hp = gethostbyname(host);
	else {
		struct in_addr addr;
		addr.s_addr = inet_addr(host);
		hp = gethostbyaddr((char*)&addr, 4, AF_INET);
	}
	if (hp == 0) __err_connect("gethost");
	// connect
	server.sin_addr.s_addr = *((unsigned long*)hp->h_addr);
	server.sin_family= AF_INET;
	server.sin_port = htons(atoi(port));
	if (connect(fd, (struct sockaddr*)&server, sizeof(server)) != 0) __err_connect("connect");
	// freehostent(hp); // strangely in MSDN, hp is NOT freed (memory leak?!)
	return fd;
}
#endif

static off_t my_netread(int fd, void *buf, off_t len)
{
	off_t rest = len, curr, l = 0;
	/* recv() and read() may not read the required length of data with
	 * one call. They have to be called repeatedly. */
	while (rest) {
		if (socket_wait(fd, 1) <= 0) break; // socket is not ready for reading
		curr = netread(fd, buf + l, rest);
		/* According to the glibc manual, section 13.2, a zero returned
		 * value indicates end-of-file (EOF), which should mean that
		 * read() will not return zero if EOF has not been met but data
		 * are not immediately available. */
		if (curr == 0) break;
		l += curr; rest -= curr;
	}
	return l;
}

/*************************
 * FTP specific routines *
 *************************/

static int kftp_get_response(knetFile *ftp)
{
	unsigned char c;
	int n = 0;
	char *p;
	if (socket_wait(ftp->ctrl_fd, 1) <= 0) return 0;
	while (netread(ftp->ctrl_fd, &c, 1)) { // FIXME: this is *VERY BAD* for unbuffered I/O
		//fputc(c, stderr);
		if (n >= ftp->max_response) {
			ftp->max_response = ftp->max_response? ftp->max_response<<1 : 256;
			ftp->response = realloc(ftp->response, ftp->max_response);
		}
		ftp->response[n++] = c;
		if (c == '\n') {
			if (n >= 4 && isdigit(ftp->response[0]) && isdigit(ftp->response[1]) && isdigit(ftp->response[2])
				&& ftp->response[3] != '-') break;
			n = 0;
			continue;
		}
	}
	if (n < 2) return -1;
	ftp->response[n-2] = 0;
	return strtol(ftp->response, &p, 0);
}

static int kftp_send_cmd(knetFile *ftp, const char *cmd, int is_get)
{
	if (socket_wait(ftp->ctrl_fd, 0) <= 0) return -1; // socket is not ready for writing
	netwrite(ftp->ctrl_fd, cmd, strlen(cmd));
	return is_get? kftp_get_response(ftp) : 0;
}

static int kftp_pasv_prep(knetFile *ftp)
{
	char *p;
	int v[6];
	kftp_send_cmd(ftp, "PASV\r\n", 1);
	for (p = ftp->response; *p && *p != '('; ++p);
	if (*p != '(') return -1;
	++p;
	sscanf(p, "%d,%d,%d,%d,%d,%d", &v[0], &v[1], &v[2], &v[3], &v[4], &v[5]);
	memcpy(ftp->pasv_ip, v, 4 * sizeof(int));
	ftp->pasv_port = (v[4]<<8&0xff00) + v&#91;5&#93;;
	return 0;
}

static int kftp_pasv_connect(knetFile *ftp)
{
	char host&#91;80&#93;, port&#91;10&#93;;
	if (ftp->pasv_port == 0) {
		fprintf(stderr, "[kftp_pasv_connect] kftp_pasv_prep() is not called before hand.\n");
		return -1;
	}
	sprintf(host, "%d.%d.%d.%d", ftp->pasv_ip[0], ftp->pasv_ip[1], ftp->pasv_ip[2], ftp->pasv_ip[3]);
	sprintf(port, "%d", ftp->pasv_port);
	ftp->fd = socket_connect(host, port);
	if (ftp->fd == -1) return -1;
	return 0;
}

int kftp_connect(knetFile *ftp)
{
	ftp->ctrl_fd = socket_connect(ftp->host, ftp->port);
	if (ftp->ctrl_fd == -1) return -1;
	kftp_get_response(ftp);
	kftp_send_cmd(ftp, "USER anonymous\r\n", 1);
	kftp_send_cmd(ftp, "PASS kftp@\r\n", 1);
	kftp_send_cmd(ftp, "TYPE I\r\n", 1);
	return 0;
}

int kftp_reconnect(knetFile *ftp)
{
	if (ftp->ctrl_fd != -1) {
		netclose(ftp->ctrl_fd);
		ftp->ctrl_fd = -1;
	}
	netclose(ftp->fd);
	return kftp_connect(ftp);
}

// initialize ->type, ->host and ->retr
knetFile *kftp_parse_url(const char *fn, const char *mode)
{
	knetFile *fp;
	char *p;
	int l;
	if (strstr(fn, "ftp://") != fn) return 0;
	for (p = (char*)fn + 6; *p && *p != '/'; ++p);
	if (*p != '/') return 0;
	l = p - fn - 6;
	fp = calloc(1, sizeof(knetFile));
	fp->type = KNF_TYPE_FTP;
	fp->fd = -1;
	/* the Linux/Mac version of socket_connect() also recognizes a port
	 * like "ftp", but the Windows version does not. */
	fp->port = strdup("21");
	fp->host = calloc(l + 1, 1);
	if (strchr(mode, 'c')) fp->no_reconnect = 1;
	strncpy(fp->host, fn + 6, l);
	fp->retr = calloc(strlen(p) + 8, 1);
	sprintf(fp->retr, "RETR %s\r\n", p);
	fp->seek_offset = -1;
	return fp;
}
// place ->fd at offset off
int kftp_connect_file(knetFile *fp)
{
	int ret;
	if (fp->fd != -1) {
		netclose(fp->fd);
		if (fp->no_reconnect) kftp_get_response(fp);
	}
	kftp_pasv_prep(fp);
	if (fp->offset) {
		char tmp[32];
		sprintf(tmp, "REST %lld\r\n", (long long)fp->offset);
		kftp_send_cmd(fp, tmp, 1);
	}
	kftp_send_cmd(fp, fp->retr, 0);
	kftp_pasv_connect(fp);
	ret = kftp_get_response(fp);
	if (ret != 150) {
		fprintf(stderr, "[kftp_connect_file] %s\n", fp->response);
		netclose(fp->fd);
		fp->fd = -1;
		return -1;
	}
	fp->is_ready = 1;
	return 0;
}

/**************************
 * HTTP specific routines *
 **************************/

knetFile *khttp_parse_url(const char *fn, const char *mode)
{
	knetFile *fp;
	char *p, *proxy, *q;
	int l;
	if (strstr(fn, "http://") != fn) return 0;
	// set ->http_host
	for (p = (char*)fn + 7; *p && *p != '/'; ++p);
	l = p - fn - 7;
	fp = calloc(1, sizeof(knetFile));
	fp->http_host = calloc(l + 1, 1);
	strncpy(fp->http_host, fn + 7, l);
	fp->http_host[l] = 0;
	for (q = fp->http_host; *q && *q != ':'; ++q);
	if (*q == ':') *q++ = 0;
	// get http_proxy
	proxy = getenv("http_proxy");
	// set ->host, ->port and ->path
	if (proxy == 0) {
		fp->host = strdup(fp->http_host); // when there is no proxy, server name is identical to http_host name.
		fp->port = strdup(*q? q : "80");
		fp->path = strdup(*p? p : "/");
	} else {
		fp->host = (strstr(proxy, "http://") == proxy)? strdup(proxy + 7) : strdup(proxy);
		for (q = fp->host; *q && *q != ':'; ++q);
		if (*q == ':') *q++ = 0;
		fp->port = strdup(*q? q : "80");
		fp->path = strdup(fn);
	}
	fp->type = KNF_TYPE_HTTP;
	fp->ctrl_fd = fp->fd = -1;
	fp->seek_offset = -1;
	return fp;
}

int khttp_connect_file(knetFile *fp)
{
	int ret, l = 0;
	char *buf, *p;
	if (fp->fd != -1) netclose(fp->fd);
	fp->fd = socket_connect(fp->host, fp->port);
	buf = calloc(0x10000, 1); // FIXME: I am lazy... But in principle, 64KB should be large enough.
	l += sprintf(buf + l, "GET %s HTTP/1.0\r\nHost: %s\r\n", fp->path, fp->http_host);
	if (fp->offset)
		l += sprintf(buf + l, "Range: bytes=%lld-\r\n", (long long)fp->offset);
	l += sprintf(buf + l, "\r\n");
	netwrite(fp->fd, buf, l);
	l = 0;
	while (netread(fp->fd, buf + l, 1)) { // read HTTP header; FIXME: bad efficiency
		if (buf[l] == '\n' && l >= 3)
			if (strncmp(buf + l - 3, "\r\n\r\n", 4) == 0) break;
		++l;
	}
	buf[l] = 0;
	if (l < 14) { // prematured header
		netclose(fp->fd);
		fp->fd = -1;
		return -1;
	}
	ret = strtol(buf + 8, &p, 0); // HTTP return code
	if (ret == 200 && fp->offset) { // 200 (complete result); then skip beginning of the file
		off_t rest = fp->offset;
		while (rest) {
			off_t l = rest < 0x10000? rest : 0x10000;
			rest -= my_netread(fp->fd, buf, l);
		}
	} else if (ret != 206 && ret != 200) {
		free(buf);
		fprintf(stderr, "[khttp_connect_file] fail to open file (HTTP code: %d).\n", ret);
		netclose(fp->fd);
		fp->fd = -1;
		return -1;
	}
	free(buf);
	fp->is_ready = 1;
	return 0;
}

/********************
 * Generic routines *
 ********************/

knetFile *knet_open(const char *fn, const char *mode)
{
	knetFile *fp = 0;
	if (mode[0] != 'r') {
		fprintf(stderr, "[kftp_open] only mode \"r\" is supported.\n");
		return 0;
	}
	if (strstr(fn, "ftp://") == fn) {
		fp = kftp_parse_url(fn, mode);
		if (fp == 0) return 0;
		if (kftp_connect(fp) == -1) {
			knet_close(fp);
			return 0;
		}
		kftp_connect_file(fp);
	} else if (strstr(fn, "http://") == fn) {
		fp = khttp_parse_url(fn, mode);
		if (fp == 0) return 0;
		khttp_connect_file(fp);
	} else { // local file
#ifdef _WIN32
		/* In windows, O_BINARY is necessary. In Linux/Mac, O_BINARY may
		 * be undefined on some systems, although it is defined on my
		 * Mac and the Linux I have tested on. */
		int fd = open(fn, O_RDONLY | O_BINARY);
#else
		int fd = open(fn, O_RDONLY);
#endif
		if (fd == -1) {
			perror("open");
			return 0;
		}
		fp = (knetFile*)calloc(1, sizeof(knetFile));
		fp->type = KNF_TYPE_LOCAL;
		fp->fd = fd;
		fp->ctrl_fd = -1;
	}
	if (fp && fp->fd == -1) {
		knet_close(fp);
		return 0;
	}
	return fp;
}

knetFile *knet_dopen(int fd, const char *mode)
{
	knetFile *fp = (knetFile*)calloc(1, sizeof(knetFile));
	fp->type = KNF_TYPE_LOCAL;
	fp->fd = fd;
	return fp;
}

off_t knet_read(knetFile *fp, void *buf, off_t len)
{
	off_t l = 0;
	if (fp->fd == -1) return 0;
	if (fp->type == KNF_TYPE_FTP) {
		if (fp->is_ready == 0) {
			if (!fp->no_reconnect) kftp_reconnect(fp);
			kftp_connect_file(fp);
		}
	} else if (fp->type == KNF_TYPE_HTTP) {
		if (fp->is_ready == 0)
			khttp_connect_file(fp);
	}
	if (fp->type == KNF_TYPE_LOCAL) { // on Windows, the following block is necessary; not on UNIX
		off_t rest = len, curr;
		while (rest) {
			curr = read(fp->fd, buf + l, rest);
			if (curr == 0) break;
			l += curr; rest -= curr;
		}
	} else l = my_netread(fp->fd, buf, len);
	fp->offset += l;
	return l;
}

int knet_seek(knetFile *fp, off_t off, int whence)
{
	if (whence == SEEK_SET && off == fp->offset) return 0;
	if (fp->type == KNF_TYPE_LOCAL) {
		/* Be aware that lseek() returns the offset after seeking,
		 * while fseek() returns zero on success. */
		off_t offset = lseek(fp->fd, off, whence);
		if (offset == -1) {
			perror("lseek");
			return -1;
		}
		fp->offset = offset;
		return 0;
	} else if (fp->type == KNF_TYPE_FTP || fp->type == KNF_TYPE_HTTP) {
		if (whence != SEEK_SET) { // FIXME: we can surely allow SEEK_CUR and SEEK_END in future
			fprintf(stderr, "[knet_seek] only SEEK_SET is supported for FTP/HTTP. Offset is unchanged.\n");
			return -1;
		}
		fp->offset = off;
		fp->is_ready = 0;
		return 0;
	}
	return -1;
}

int knet_close(knetFile *fp)
{
	if (fp == 0) return 0;
	if (fp->ctrl_fd != -1) netclose(fp->ctrl_fd); // FTP specific
	if (fp->fd != -1) {
		/* On Linux/Mac, netclose() is an alias of close(), but on
		 * Windows, it is an alias of closesocket(). */
		if (fp->type == KNF_TYPE_LOCAL) close(fp->fd);
		else netclose(fp->fd);
	}
	free(fp->host); free(fp->port);
	free(fp->response); free(fp->retr); // FTP specific
	free(fp->path); free(fp->http_host); // HTTP specific
	free(fp);
	return 0;
}

Recently I have put rde::hash_map, which implements an open addressing hash table with linear probing, to my benchmark. It is surprisingly fast. Studying its source codes reveals that it achieves the speed by caching the hash values and by reducing cache misses. Both ways increase the memory footprint.

In most of hash libraries, the hash value is hash(key)%n_buckets, which only contains log2(n_buckets) bits of information. In contrast, rde::hash_map stores the hash(key). Comparison between keys is first performed between hashes and then between keys. As a hash contains 30-bit information in rdestl, distinct keys can rarely have an identical hash, which saves a lot of calls to calculating hashes and will gain a big speedup on complex keys such as strings. However, storing the hash require additional 4 bytes, which can cost more given improperly aligned memory.

As rde::hash_map requires additional 4 bytes in each bucket, it is able to mark a bucket as being empty or deleted with two bits of these 4 bytes, instead of using a flag array as is in my khash or using special keys as is in google dense hash table. As a result, rde::hash_map avoids the cache miss in khash AND the expensive key comparisons in google dense hash table, which makes it faster than both libraries on both integer and string keys. Also, like google dense hash_map, rde::hash_map pack key and value in a struct instead of putting keys and values in two arrays as is in khash. This also saves a cache miss on data retrieval. However, packing key and value may bring memory overhead when the key type and the value type are unaligned in memory. Please see my old post for further discussions.

Interestingly, rde::hash_map chose linear probing. I used to try linear probing in my khash and it deliverse slower speed than double hashing even on random input. Maybe I should try harder?

In conclusion, rde::hash_map achieves high speed at the cost of high memory consumption. It is a good candidate if you do not care too much about memory. My khash library first aims at a library with small memory foot print and then at speed. I would not be surprised to see it is not the fastest hash table library.

In C programming, the main difference between low-level I/O functions (open/close/read/write) and stream-level I/O functions (fopen/fclose/fread/fwrite) is that stream-level functions are buffered. Presumably, low-level I/O functions will incur a disk operation on each read(). Although the kernel may cache this, we cannot rely too much on it. Disk operations are expensive and so low-level I/O does not provide fgetc equivalent.

Stream-level I/O functions have a buffer. On reading, they load a block of data from disk to memory. If at a fgetc() call the data have been retrieved to memory, it will not incur a disk operation, which greatly improves the efficiency.

Stream-level I/O functions are part of the standard C library. Why do we need a new wrapper? Three reasons. First, when you work with an alternative I/O library (such as zlib or libbzip2) which do not come with buffered I/O routines, you probably need a buffered wrapper to make your code efficient. Second, using a generic wrapper makes your code more flexible when you want to change the type of input stream. For example, you may want to write a parser that works on a normal stream, a zlib-compressed stream and on a C string. Using a unified stream wrapper will simplify coding. Third, my feeling is most of steam-level I/O functions in stdio.h are not conventient given that they cannot enlarge a string automatically. In a lot of cases, I need to read one line but I do not know how long a line can be. Managing this case is not so hard, but doing this again and again is boring.

In the end, I come up with my own buffered wrapper for input streams. It is generic in that it works on all types of I/O steams with a read() call (or equivalent), or even on a C string. I show an example here without much explanation. I may expand this post in future. Source codes can be found in my programs page.

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include "kstream.h"
// arguments: type of the stream handler,
//   function to read a block, size of the buffer
KSTREAM_INIT(int, read, 10)

int main()
{
	int fd;
	kstream_t *ks;
	kstring_t str;
	bzero(&str, sizeof(kstring_t));
	fd = open("kstream.h", O_RDONLY);
	ks = ks_init(fd);
	while (ks_getuntil(ks, '\n', &str, 0) >= 0)
		printf("%s\n", str.s);
	ks_destroy(ks);
	free(str.s);
	close(fd);
	return 0;
}

This is a follow-up of my previous post. Here I change the table to several charts. Hope it seems more friendly to readers. You can find the links to these libraries in that table. Their source codes, including my testing code, are available here. You may also want to see my previous posts in the last few days for my interpretation to the results.

On C string (char*) keys, I fail to use JE_rb_old and JE_rb_new to get the correct result on Mac and so they are not showed in the charts. I would really appreciate if someone may give me the correct implementation using these libraries. In addition, tr1_unordered_map uses a lot of memory according to my program. The memory for string keys are faked.

For conveniece, here are some brief descriptions of these libraries (with no order):

  • google_dense and google_sparse: google’s sparsehash library. Google_dense is fast but memory hungery while google_sparse is the opposite.
  • sgi_hash_map and sgi_map: SGI’s STL that comes with g++-4. The backend of sgi_map is a three-pointer red-black tree.
  • tr1::unordered_map: GCC’s TR1 library that comes with g++-4. It implements a hash table.
  • rdestl::hash_map: from RDESTL, another implementation of STL.
  • uthash: a hash library in C
  • JG_btree: John-Mark Gurney’s btree library.
  • JE_rb_new, JE_rb_old, JE_trp_hash and JE_trp_prng: Jason Evans’ binary search tree libraries. JE_rb_new implements a left-leaning red-black tree; JE_rb_old a three-pointer red-black tree; both JE_trp_hash and JE_trp_prng implement treaps but with different strategies on randomness.
  • libavl_rb, libavl_prb, libavl_avl and libavl_bst: from GNU libavl. They implment a two-pointer red-black tree, a three-pointer red-black tree, an AVL tree and a unbalanced binary search tree, respectively.
  • NP_rbtree and NP_splaytree: Niels Provos’ tree library for FreeBSD. A three-pointer red-black tree and a splay tree.
  • TN_rbtree: Thomas Niemann’s red-black tree. I ported it to C++.
  • sglib_rbtree: from SGLIB. It implements a two-pointer recursive red-black tree (all the other binary search trees are implemented without recursion).
  • libavl_avl_cpp, libavl_rb_cpp and libavl_rb_cpp2: incomplete C++ version of libavl (no iterator), ported by me. Libavl_rb_cpp2 further uses the same technique in JE_rb_new to save the color bit. Source codes available in the package.
  • khash and kbtree: my hash table and B-tree implementation. kbtree is based on JG_rbtree.

Again, this post is a follow-up of this page. Source code is available here.

AVL Tree vs. Red-Black Tree

If you google “avl vs. red-black”, the first returned page will largely answer your question. In summary, the height of an AVL tree is at most ~1.44log(N), lower than the maximum height of a red-black tree, ~2log(N). However, upon insertion, AVL tree may need O(log(N)) operations to rebalance the tree, but red-black tree requires O(1). This means AVL tree is probably faster on searching but slower on insertion. In addition, red-black tree is easier to be implemented as a persistent data structure.

Practical analysis is not so clear as theoretical analysis. I think AVL and red-black may have quite similar overall performance. Firstly, although the maximum possible height of a AVL tree is better, both AVL and red-black trees are probably well balanced without reaching the maximum height frequently. Secondly, although AVL tree has an O(log(N)) loop upon insertion, it is a very tight loop with a small constant, while red-black tree may have a loop of small size with big constant (amortized O(1), though). I am not sure whether red-black tree really wins on insertion.

In Daniel’s benchmark, AVL tree and red-black tree have quite similar speed (in consideration that Daniel implemented AVL tree). In my benchmark, libavl’s red-black tree and AVL tree also deliver similar speed.

On memory, a standard AVL tree requires two pointers and one balancing factor at each node, while a red-black tree can be implemented using two or three pointers plus a color bit (either the balancing factor or the color bit will cost the size of a pointer due to memory alignment). A red-black tree with two pointers is usually slower than a red-black tree with three pointers (libavl_rb_cpp vs. NP_rbtree), but the difference is marginal. In my benchmark, a red-black tree with two pointers has pretty similar performance in comparison to an AVL tree (libavl_rb_cpp vs. libavl_avl_cpp).

In all, my opinion is AVL and red-black are very similar in many aspects. Which to use does not matter too much if you are not implementing a persistent data structure.

Splay Tree vs. AVL/RB Tree

Search on a splay tree may be O(N) if elements are inserted in order. This “feature” alone pushes me away. Although wiki says it is possible to avoid the worst case, I am not aware of a practical implementation. On random input, splay tree is also slower than AVL and red-black trees.

One of the advantages of a splay tree is it only requires two pointers with no additional information. This is helpful when memory is critical, but I would rather use a modified red-black to achieve the same memory footprint. Another advantage of splay tree is it is self-adjusting. I have not tested this in my benchmark.

Treap v. AVL/RB Tree

I do not know much about treap. It seems to me treap does not guarantee O(log(N)) search. Fortunately, in practice the worst O(N) case should almost never happen due to the random behaviour of a treap. In my benchmark, treap is slower than splay tree on random input. Probably it is not the best choice.

Implementations

As to red-black tree, SGI STL’s set/map is really a good one. You can find other C++ implementations (TN_rbtree libavl_rb_cpp) in my benchmark. They are also efficient, but incomplete. Libavl_rb_cpp is adapted from libavl_rb which only uses two pointers. NP_rbtree (FreeBSD’s tree.h) is the best choice for a C programmer.

As to AVL tree, I am not aware of good AVL implementations in either C or C++. Stlavlmap is supposed to be efficient, but I cannot compile it. WK_avl may be a good library, judged from its source code, but it is not easy to use. GM_avl uses recursion and so cannot be efficient. Libavl_avl_cpp is my ported version of libavl_avl. It is efficient but incomplete. You may finish it if you really like to try an AVL tree. BTW, libavl is in fact a very high-quality library, but using void* makes it inferior to others. Someone should improve its API using C macros or C++ template.

Concluding Remarks

For general purpose, red-black tree might still be the best choice. It is among the fastest binary search trees and more easily adapted to a persistent data structure. AVL tree is equally good when persistence is not needed (well, all my applications do not require persistence). At last, do not forget my previous post: in-memory B-tree can be both faster and more light-weighted than red-black/AVL trees.

Update

Sambal told me that on insertion, red-black only guarantees amortized O(1). The worst case can be O(log(N)) with large constant. Then the advantage of red-black tree on insertion is less obvious.

Update the claim about the memory usage of AVL and red-black tree.

I have done a more thorough benchmark of various hash table and search tree implementations. I believe it is the best page from my blog so far. Unfortunately, few people have looked at this page according to the wordpress statistics. Possibly putting too much information in one page without enough explanations has scared people; or the page is posted on another domain and people do not like to follow the link. Anyway, it might be good to discuss further in my wordpress blog. I will write a series of posts to extend that page. Please look at here for the whole page and here for the source code.

Let’s start this series with generic programming in C.

We usualy use void* to implement generic containers in C. To do in this way, we need to pay an overhead on retrieving data pointed by a void* pointer. We often, but not always, need a function call for operations on void* data, such as copying, comparing or hashing. This adds additional overhead on speed. Furthermore, if the size of an object is small, we would rather put the whole object in the container instead of wasting the memory for an addition pointer: using void* may also bring memory overhead.

To see how large this void* effect is, I ported libavl’s AVL tree code from C to C++ (source code is available in the download link showed above). I just honestly translated void* to C++ template without applying any additional optimizations (well, Ben Pfaff does not leave much room for optimization, either). As you can see from the table, my C++ template version is both faster and more light-weighted than the original C version. The conclusion is: using void* in this way may be inefficient.

There is another different, but similar, way to use void*. In implementing sort, B-trees or open addressing hash tables, we use arrays. It is possible to put an object itself in a void* array instead of putting its pointer into the array. We can do most of operations when we know the size of each object. For example, we can implement swap in this way:

void swap(void *p, int obj_size, int i, int j) {
  void *tmp = malloc(obj_size); // for efficiency, we can preallocate tmp
  memcpy(tmp, p+obj_size*i, obj_size);
  memcpy(p+obj_size*i, p+obj_size*j, obj_size);
  memcpy(p+obj_size*j, tmp, obj_size);
  free(tmp);
}

and even implement a whole sorting algorithm with similar ideas. I believe libc’s qsort is implemented in this way. However, as we can see, we need to call memcpy() on each assignment. On large memory, memcpy() is probably efficient with the help of vectorization, but on small memory, it is slower than direct assignment as we have to retrieve obj_size and memcpy() does not know what obj_size could be. Also, as the compiler does not know what obj_size is at the compiling time, the compiler cannot optimize obj_size*50 to something like 50<<2 even if we know obj_size=4. In all, using void* in this way will not have overhead on memory, but will have on speed. This is why libc’s qsort is always slower than STL’s or my sorting implementations (see my Comparison of Internal Sorting Algorithms).

Here is the lesson. If we care about efficiency, using void* to achieve generic programming in C is a bad idea. C++ template does much better at this point and causes almost no overhead on instantiation. If we want to stick to C, I would suggest switching to macros as are used in my khash.h/kbtree.h/ksort.h (I learned this from FreeBSD’s tree.h). C macros mimics C++ template and does not cause more overhead than C++ template. However, C macros are hard to read or write. That is why I ported libavl’s AVL implementation to a C++ template instead of to C macros. It would be a pain.

I was wondering whether retrieving an element in a struct will incur additional overhead. And so I did the following experiment. Here the same array is sorted in two ways: with or without data retrieving from a struct. Both ways yield identical results. The question is whether the compiler knows the two ways are the same and can achieve the same efficiency.

#include
#include
#include
#include “ksort.h”

typedef struct {
int a;
} myint_t;

#define myint_lt(_a, _b) ((_a).a < (_b).a) KSORT_INIT_GENERIC(int) KSORT_INIT(my, myint_t, myint_lt) int main() { int i, N = 10000000; myint_t *a; clock_t t; a = (myint_t*)malloc(sizeof(myint_t) * N); srand48(11); for (i = 0; i != N; ++i) a[i].a = lrand48(); t = clock(); ks_introsort(int, N, (int*)a); printf("%.3lf\n", (double)(clock() - t) / CLOCKS_PER_SEC); srand48(11); for (i = 0; i != N; ++i) a[i].a = lrand48(); t = clock(); ks_introsort(my, N, a); printf("%.3lf\n", (double)(clock() - t) / CLOCKS_PER_SEC); free(a); return 0; } [/sourcecode] Here is the speed with different compilers on different CPUs (first value for without data retrieving and second with):

  • Mac-Intel, gcc-4.0, -O2: 1.422 sec vs. 1.802 sec
  • Mac-Intel, gcc-4.2, -O2: 1.438 vs. 1.567
  • Mac-Intel, gcc-4.0, -O2 -fomit-frame-pointer: 1.425 vs. 1.675
  • Mac-Intel, gcc-4.2, -O2 -fomit-frame-pointer: 1.438 vs. 1.448
  • Linux-Intel, gcc-4.1, -O2: 1.600 vs. 1.520
  • Linux-Intel, gcc-4.1, -O2 -fomit-frame-pointer: 1.620 vs. 1.530
  • Linux-Intel, icc, -O2 -fomit-frame-pointer: 1.600 vs. 1.580

The conclusion is retrieving data from a struct may have marginal overhead in comprison to direct data access. However, a good compiler can avoid this and produce nearly optimal machine code. Using “-fomit-frame-pointer” may help for some machines, but not for others. In addition, it is a bit surprising to me that gcc-linux generates faster code for data retrieval in a struct. Swapping the two ways does not change the conclusion.

Over the weekend, I have done a more comprehensive benchmark of various libraries on search trees. Two AVL, seven red-black tree, one Splay tree, two treap implementations are involved, together with seven hash table libraries. As I need to present a big table, I have to write it in a free-style HTML page. You can find the complete benchmark here and all the source codes here. I only copy the “concluding remarks” in the benchmark page as follows:

  • Hash table is preferred over search trees if we do not require order.
  • In applications similar to my example, B-tree is better than most of binary search trees in terms of both speed and memory.
  • AVL tree and red-black tree are the best general-purposed BSTs. They are very close in efficiency.
  • For pure C libraries, using macros is usually more efficient than using void* to achieve generic programming.

You can find the result and much more discussions in that page. If you think the source codes or the design of benchmark can be improved, please leave comments here or send me E-mail. In addition, I failed to use several libraries and so you can see some blank in the table. I would also appreciate if someone could show me how to use those libraries correctly.

Currently I am using freewebs.com to hold my programs and free-style HTML pages. It is down today and I am really annoyed by its single-file uploader and lack of web statistics. Finally I decide to shift to awardspace.com which seems to offer much more than freewebs.com. My next post will use awardspace.com.

When talking about in-memory search tree, we usually think of various binary search trees: red-black tree, AVL tree, treap, splay tree and so on. We do not often think of B-tree, as B-tree is commonly introduced as an on-disk data structure rather than in-memory one. Is B-tree also a good data structure for in-memory ordered dictionary? I used to search for the performance comparison between B-tree and binary search trees, but ended up with nothing useful. It seems that only I am interested in such comparison and so I have to do it by myself.

I found John-Mark Gurney’s B-tree via google search. It is well coded and full of clever ideas. The original version has small memory footprint, but it is not as fast as STL’s red-black tree. I studied this source codes and think I should be able to further optimize it. In the end, I got my kbtree.h macro library. As you can see in my hash table benchmark, the modified version beats STL set while using even smaller memory than the original version. I think I am now at the position to say: at least for some applications, B-tree is a better ordered data structure than most of binary search trees.

The most attractive feature of B-tree is its small memory usage. A binary tree needs at least two pointers for each record, which amounts to 16N on a modern 64-bit systems. A B-tree only needs one pointer. Although in a B-tree each node may not be full, a sufficiently large B-tree should be at least 50% full by definition and in average around 75% full. On a 64-bit system, the extra memory is only 8N/0.75+KN(1/0.75-1)=(10+0.3K)N, where K is the size of a key. In fact we can do even better as we do not need to allocate the null pointers in leaves. The practical memory overhead can be reduced to below (5+0.3K)N (in fact, as the majority of nodes in a B-tree are leaves, the factor 5 should be smaller in practice), far better than a binary search tree. On speed, no binary search tree with just two additional pointers (splay tree and hash treap) can achieve the best performance. We usually need additional information at each node (AVL tree and standard red-black tree) or a random number (treap) to get good performance. B-tree is different. It is even faster than the standard red-black tree while still using (5+0.3K)N extra memory! People should definitely pay more attention to B-tree.

Update: The modified B-tree is available here (HTML) as a single C header file. Example is here. Currently, the APIs are not very friendly but are ready to use. In case you want to give a try. Note that you must make sure the key is absent in the tree before kb_put() and make sure the key is present in the tree before calling kb_del().

Someone has corrected me. STL is a specification, not an implementation. By STL in my blog, I always mean SGI STL, the default STL that comes with GCC.

Over the weekend I have done a more complete benchmark of various libraries on search trees and hash tables. Please read this post if you are interested.

I realize that a lot of red-black tree implementations do not need a parent pointer, although SGI STL’s one uses. My comment below is somewhat wrong.

Update 2: kbtree.h has long been moved here, along with khash.h and my other 1- or 2-file libraries.