In a project I want to directly open part of a remote file sitting on FTP or HTTP. I do not want to download the whole file because that file is frequently over 10GB in size. What I want to do is to have function calls with a similar interface to fopen/fclose/ftell/fseek/fread (I do not need fwrite for the moment). I can then open a remote file as if it is local. I did quite some google search for a suitable library, but most of related libraries are designed for file transfer. In the end, I decide to write my own library for this task. And here it is.
This library consists of one C header file and one C source file. It was originally developed for Linux/Mac and then ported to Windows, supporting the MinGW compiler. On Linux, the implemented features work properly. On Windows, however, random access to FTP files sometimes does not.
This library provides knet_open(), knet_close(), knet_tell(), knet_seek() and knet_read() function calls. You can manipulate a file on HTTP with, for example:
char buf[4096];
knetFile *fp = knet_open("http://host/file", "r");
knet_seek(fp, 1000, SEEK_SET);
knet_read(fp, buf, 4096);
knet_close(fp);
Opening FTP file is similar. This library is also transparent to local files, when the file name does not start with “http://” or “ftp://”.
This is my first attempt on network programming and surely a lot of things can be improved. Please leave me messages if you have any suggestions. Thanks in advance.
Here is the C header file:
#ifndef KNETFILE_H
#define KNETFILE_H
#include <stdint.h>
#include <fcntl.h>
#ifndef _WIN32
#define netread(fd, ptr, len) read(fd, ptr, len)
#define netwrite(fd, ptr, len) write(fd, ptr, len)
#define netclose(fd) close(fd)
#else
#include <winsock.h>
#define netread(fd, ptr, len) recv(fd, ptr, len, 0)
#define netwrite(fd, ptr, len) send(fd, ptr, len, 0)
#define netclose(fd) closesocket(fd)
#endif
// FIXME: currently I/O is unbuffered
#define KNF_TYPE_LOCAL 1
#define KNF_TYPE_FTP 2
#define KNF_TYPE_HTTP 3
typedef struct knetFile_s {
int type, fd;
int64_t offset;
char *host, *port;
// the following are for FTP only
int ctrl_fd, pasv_ip[4], pasv_port, max_response, no_reconnect, is_ready;
char *response, *retr;
int64_t seek_offset; // for lazy seek
// the following are for HTTP only
char *path, *http_host;
} knetFile;
#define knet_tell(fp) ((fp)->offset)
#define knet_fileno(fp) ((fp)->fd)
#ifdef __cplusplus
extern "C" {
#endif
#ifdef _WIN32
int knet_win32_init();
void knet_win32_destroy();
#endif
knetFile *knet_open(const char *fn, const char *mode);
/*
This only works with local files.
*/
knetFile *knet_dopen(int fd, const char *mode);
/*
If ->is_ready==0, this routine updates ->fd; otherwise, it simply
reads from ->fd.
*/
off_t knet_read(knetFile *fp, void *buf, off_t len);
/*
This routine only sets ->offset and ->is_ready=0. It does not
communicate with the FTP server.
*/
int knet_seek(knetFile *fp, off_t off, int whence);
int knet_close(knetFile *fp);
#ifdef __cplusplus
}
#endif
#endif
Here is the main C source code:
/* Probably I will not do socket programming in the next few years and
therefore I decide to heavily annotate this file, for Linux and
Windows as well. */
#include <time.h>
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#ifdef _WIN32
#include <winsock.h>
#else
#include <netdb.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#endif
#include "knetfile.h"
/* In winsock.h, the type of a socket is SOCKET, which is: "typedef
* u_int SOCKET". An invalid SOCKET is: "(SOCKET)(~0)", or signed
* integer -1. In knetfile.c, I use "int" for socket type
* throughout. This should be improved to avoid confusion.
*
* In Linux/Mac, recv() and read() do almost the same thing. You can see
* in the header file that netread() is simply an alias of read(). In
* Windows, however, they are different and using recv() is mandatory.
*/
/* This function tests if the file handler is ready for reading (or
* writing if is_read==0). */
static int socket_wait(int fd, int is_read)
{
fd_set fds, *fdr = 0, *fdw = 0;
struct timeval tv;
int ret;
tv.tv_sec = 5; tv.tv_usec = 0; // 5 seconds time out
FD_ZERO(&fds);
FD_SET(fd, &fds);
if (is_read) fdr = &fds;
else fdw = &fds;
ret = select(fd+1, fdr, fdw, 0, &tv);
if (ret == -1) perror("select");
return ret;
}
#ifndef _WIN32
/* This function does not work with Windows due to the lack of
* getaddrinfo() in winsock. It is addapted from an example in "Beej's
* Guide to Network Programming" (http://beej.us/guide/bgnet/). */
static int socket_connect(const char *host, const char *port)
{
#define __err_connect(func) do { perror(func); freeaddrinfo(res); return -1; } while (0)
int on = 1, fd;
struct linger lng = { 0, 0 };
struct addrinfo hints, *res;
memset(&hints, 0, sizeof(struct addrinfo));
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
/* In Unix/Mac, getaddrinfo() is the most convenient way to get
* server information. */
if (getaddrinfo(host, port, &hints, &res) != 0) __err_connect("getaddrinfo");
if ((fd = socket(res->ai_family, res->ai_socktype, res->ai_protocol)) == -1) __err_connect("socket");
/* The following two setsockopt() are used by ftplib
* (http://nbpfaus.net/~pfau/ftplib/). I am not sure if they
* necessary. */
if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) == -1) __err_connect("setsockopt");
if (setsockopt(fd, SOL_SOCKET, SO_LINGER, &lng, sizeof(lng)) == -1) __err_connect("setsockopt");
if (connect(fd, res->ai_addr, res->ai_addrlen) != 0) __err_connect("connect");
freeaddrinfo(res);
return fd;
}
#else
/* In windows, the first thing is to establish the TCP connection. */
int knet_win32_init()
{
WSADATA wsaData;
return WSAStartup(MAKEWORD(2, 2), &wsaData);
}
void knet_win32_destroy()
{
WSACleanup();
}
/* A slightly modfied version of the following function also works on
* Mac (and presummably Linux). However, this function is not stable on
* my Mac. It sometimes works fine but sometimes does not. Therefore for
* non-Windows OS, I do not use this one. */
static SOCKET socket_connect(const char *host, const char *port)
{
#define __err_connect(func) do { perror(func); return -1; } while (0)
int on = 1;
SOCKET fd;
struct linger lng = { 0, 0 };
struct sockaddr_in server;
struct hostent *hp = 0;
// open socket
if ((fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) == INVALID_SOCKET) __err_connect("socket");
if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (char*)&on, sizeof(on)) == -1) __err_connect("setsockopt");
if (setsockopt(fd, SOL_SOCKET, SO_LINGER, (char*)&lng, sizeof(lng)) == -1) __err_connect("setsockopt");
// get host info
if (isalpha(host[0])) hp = gethostbyname(host);
else {
struct in_addr addr;
addr.s_addr = inet_addr(host);
hp = gethostbyaddr((char*)&addr, 4, AF_INET);
}
if (hp == 0) __err_connect("gethost");
// connect
server.sin_addr.s_addr = *((unsigned long*)hp->h_addr);
server.sin_family= AF_INET;
server.sin_port = htons(atoi(port));
if (connect(fd, (struct sockaddr*)&server, sizeof(server)) != 0) __err_connect("connect");
// freehostent(hp); // strangely in MSDN, hp is NOT freed (memory leak?!)
return fd;
}
#endif
static off_t my_netread(int fd, void *buf, off_t len)
{
off_t rest = len, curr, l = 0;
/* recv() and read() may not read the required length of data with
* one call. They have to be called repeatedly. */
while (rest) {
if (socket_wait(fd, 1) <= 0) break; // socket is not ready for reading
curr = netread(fd, buf + l, rest);
/* According to the glibc manual, section 13.2, a zero returned
* value indicates end-of-file (EOF), which should mean that
* read() will not return zero if EOF has not been met but data
* are not immediately available. */
if (curr == 0) break;
l += curr; rest -= curr;
}
return l;
}
/*************************
* FTP specific routines *
*************************/
static int kftp_get_response(knetFile *ftp)
{
unsigned char c;
int n = 0;
char *p;
if (socket_wait(ftp->ctrl_fd, 1) <= 0) return 0;
while (netread(ftp->ctrl_fd, &c, 1)) { // FIXME: this is *VERY BAD* for unbuffered I/O
//fputc(c, stderr);
if (n >= ftp->max_response) {
ftp->max_response = ftp->max_response? ftp->max_response<<1 : 256;
ftp->response = realloc(ftp->response, ftp->max_response);
}
ftp->response[n++] = c;
if (c == '\n') {
if (n >= 4 && isdigit(ftp->response[0]) && isdigit(ftp->response[1]) && isdigit(ftp->response[2])
&& ftp->response[3] != '-') break;
n = 0;
continue;
}
}
if (n < 2) return -1;
ftp->response[n-2] = 0;
return strtol(ftp->response, &p, 0);
}
static int kftp_send_cmd(knetFile *ftp, const char *cmd, int is_get)
{
if (socket_wait(ftp->ctrl_fd, 0) <= 0) return -1; // socket is not ready for writing
netwrite(ftp->ctrl_fd, cmd, strlen(cmd));
return is_get? kftp_get_response(ftp) : 0;
}
static int kftp_pasv_prep(knetFile *ftp)
{
char *p;
int v[6];
kftp_send_cmd(ftp, "PASV\r\n", 1);
for (p = ftp->response; *p && *p != '('; ++p);
if (*p != '(') return -1;
++p;
sscanf(p, "%d,%d,%d,%d,%d,%d", &v[0], &v[1], &v[2], &v[3], &v[4], &v[5]);
memcpy(ftp->pasv_ip, v, 4 * sizeof(int));
ftp->pasv_port = (v[4]<<8&0xff00) + v[5];
return 0;
}
static int kftp_pasv_connect(knetFile *ftp)
{
char host[80], port[10];
if (ftp->pasv_port == 0) {
fprintf(stderr, "[kftp_pasv_connect] kftp_pasv_prep() is not called before hand.\n");
return -1;
}
sprintf(host, "%d.%d.%d.%d", ftp->pasv_ip[0], ftp->pasv_ip[1], ftp->pasv_ip[2], ftp->pasv_ip[3]);
sprintf(port, "%d", ftp->pasv_port);
ftp->fd = socket_connect(host, port);
if (ftp->fd == -1) return -1;
return 0;
}
int kftp_connect(knetFile *ftp)
{
ftp->ctrl_fd = socket_connect(ftp->host, ftp->port);
if (ftp->ctrl_fd == -1) return -1;
kftp_get_response(ftp);
kftp_send_cmd(ftp, "USER anonymous\r\n", 1);
kftp_send_cmd(ftp, "PASS kftp@\r\n", 1);
kftp_send_cmd(ftp, "TYPE I\r\n", 1);
return 0;
}
int kftp_reconnect(knetFile *ftp)
{
if (ftp->ctrl_fd != -1) {
netclose(ftp->ctrl_fd);
ftp->ctrl_fd = -1;
}
netclose(ftp->fd);
return kftp_connect(ftp);
}
// initialize ->type, ->host and ->retr
knetFile *kftp_parse_url(const char *fn, const char *mode)
{
knetFile *fp;
char *p;
int l;
if (strstr(fn, "ftp://") != fn) return 0;
for (p = (char*)fn + 6; *p && *p != '/'; ++p);
if (*p != '/') return 0;
l = p - fn - 6;
fp = calloc(1, sizeof(knetFile));
fp->type = KNF_TYPE_FTP;
fp->fd = -1;
/* the Linux/Mac version of socket_connect() also recognizes a port
* like "ftp", but the Windows version does not. */
fp->port = strdup("21");
fp->host = calloc(l + 1, 1);
if (strchr(mode, 'c')) fp->no_reconnect = 1;
strncpy(fp->host, fn + 6, l);
fp->retr = calloc(strlen(p) + 8, 1);
sprintf(fp->retr, "RETR %s\r\n", p);
fp->seek_offset = -1;
return fp;
}
// place ->fd at offset off
int kftp_connect_file(knetFile *fp)
{
int ret;
if (fp->fd != -1) {
netclose(fp->fd);
if (fp->no_reconnect) kftp_get_response(fp);
}
kftp_pasv_prep(fp);
if (fp->offset) {
char tmp[32];
sprintf(tmp, "REST %lld\r\n", (long long)fp->offset);
kftp_send_cmd(fp, tmp, 1);
}
kftp_send_cmd(fp, fp->retr, 0);
kftp_pasv_connect(fp);
ret = kftp_get_response(fp);
if (ret != 150) {
fprintf(stderr, "[kftp_connect_file] %s\n", fp->response);
netclose(fp->fd);
fp->fd = -1;
return -1;
}
fp->is_ready = 1;
return 0;
}
/**************************
* HTTP specific routines *
**************************/
knetFile *khttp_parse_url(const char *fn, const char *mode)
{
knetFile *fp;
char *p, *proxy, *q;
int l;
if (strstr(fn, "http://") != fn) return 0;
// set ->http_host
for (p = (char*)fn + 7; *p && *p != '/'; ++p);
l = p - fn - 7;
fp = calloc(1, sizeof(knetFile));
fp->http_host = calloc(l + 1, 1);
strncpy(fp->http_host, fn + 7, l);
fp->http_host[l] = 0;
for (q = fp->http_host; *q && *q != ':'; ++q);
if (*q == ':') *q++ = 0;
// get http_proxy
proxy = getenv("http_proxy");
// set ->host, ->port and ->path
if (proxy == 0) {
fp->host = strdup(fp->http_host); // when there is no proxy, server name is identical to http_host name.
fp->port = strdup(*q? q : "80");
fp->path = strdup(*p? p : "/");
} else {
fp->host = (strstr(proxy, "http://") == proxy)? strdup(proxy + 7) : strdup(proxy);
for (q = fp->host; *q && *q != ':'; ++q);
if (*q == ':') *q++ = 0;
fp->port = strdup(*q? q : "80");
fp->path = strdup(fn);
}
fp->type = KNF_TYPE_HTTP;
fp->ctrl_fd = fp->fd = -1;
fp->seek_offset = -1;
return fp;
}
int khttp_connect_file(knetFile *fp)
{
int ret, l = 0;
char *buf, *p;
if (fp->fd != -1) netclose(fp->fd);
fp->fd = socket_connect(fp->host, fp->port);
buf = calloc(0x10000, 1); // FIXME: I am lazy... But in principle, 64KB should be large enough.
l += sprintf(buf + l, "GET %s HTTP/1.0\r\nHost: %s\r\n", fp->path, fp->http_host);
if (fp->offset)
l += sprintf(buf + l, "Range: bytes=%lld-\r\n", (long long)fp->offset);
l += sprintf(buf + l, "\r\n");
netwrite(fp->fd, buf, l);
l = 0;
while (netread(fp->fd, buf + l, 1)) { // read HTTP header; FIXME: bad efficiency
if (buf[l] == '\n' && l >= 3)
if (strncmp(buf + l - 3, "\r\n\r\n", 4) == 0) break;
++l;
}
buf[l] = 0;
if (l < 14) { // prematured header
netclose(fp->fd);
fp->fd = -1;
return -1;
}
ret = strtol(buf + 8, &p, 0); // HTTP return code
if (ret == 200 && fp->offset) { // 200 (complete result); then skip beginning of the file
off_t rest = fp->offset;
while (rest) {
off_t l = rest < 0x10000? rest : 0x10000;
rest -= my_netread(fp->fd, buf, l);
}
} else if (ret != 206 && ret != 200) {
free(buf);
fprintf(stderr, "[khttp_connect_file] fail to open file (HTTP code: %d).\n", ret);
netclose(fp->fd);
fp->fd = -1;
return -1;
}
free(buf);
fp->is_ready = 1;
return 0;
}
/********************
* Generic routines *
********************/
knetFile *knet_open(const char *fn, const char *mode)
{
knetFile *fp = 0;
if (mode[0] != 'r') {
fprintf(stderr, "[kftp_open] only mode \"r\" is supported.\n");
return 0;
}
if (strstr(fn, "ftp://") == fn) {
fp = kftp_parse_url(fn, mode);
if (fp == 0) return 0;
if (kftp_connect(fp) == -1) {
knet_close(fp);
return 0;
}
kftp_connect_file(fp);
} else if (strstr(fn, "http://") == fn) {
fp = khttp_parse_url(fn, mode);
if (fp == 0) return 0;
khttp_connect_file(fp);
} else { // local file
#ifdef _WIN32
/* In windows, O_BINARY is necessary. In Linux/Mac, O_BINARY may
* be undefined on some systems, although it is defined on my
* Mac and the Linux I have tested on. */
int fd = open(fn, O_RDONLY | O_BINARY);
#else
int fd = open(fn, O_RDONLY);
#endif
if (fd == -1) {
perror("open");
return 0;
}
fp = (knetFile*)calloc(1, sizeof(knetFile));
fp->type = KNF_TYPE_LOCAL;
fp->fd = fd;
fp->ctrl_fd = -1;
}
if (fp && fp->fd == -1) {
knet_close(fp);
return 0;
}
return fp;
}
knetFile *knet_dopen(int fd, const char *mode)
{
knetFile *fp = (knetFile*)calloc(1, sizeof(knetFile));
fp->type = KNF_TYPE_LOCAL;
fp->fd = fd;
return fp;
}
off_t knet_read(knetFile *fp, void *buf, off_t len)
{
off_t l = 0;
if (fp->fd == -1) return 0;
if (fp->type == KNF_TYPE_FTP) {
if (fp->is_ready == 0) {
if (!fp->no_reconnect) kftp_reconnect(fp);
kftp_connect_file(fp);
}
} else if (fp->type == KNF_TYPE_HTTP) {
if (fp->is_ready == 0)
khttp_connect_file(fp);
}
if (fp->type == KNF_TYPE_LOCAL) { // on Windows, the following block is necessary; not on UNIX
off_t rest = len, curr;
while (rest) {
curr = read(fp->fd, buf + l, rest);
if (curr == 0) break;
l += curr; rest -= curr;
}
} else l = my_netread(fp->fd, buf, len);
fp->offset += l;
return l;
}
int knet_seek(knetFile *fp, off_t off, int whence)
{
if (whence == SEEK_SET && off == fp->offset) return 0;
if (fp->type == KNF_TYPE_LOCAL) {
/* Be aware that lseek() returns the offset after seeking,
* while fseek() returns zero on success. */
off_t offset = lseek(fp->fd, off, whence);
if (offset == -1) {
perror("lseek");
return -1;
}
fp->offset = offset;
return 0;
} else if (fp->type == KNF_TYPE_FTP || fp->type == KNF_TYPE_HTTP) {
if (whence != SEEK_SET) { // FIXME: we can surely allow SEEK_CUR and SEEK_END in future
fprintf(stderr, "[knet_seek] only SEEK_SET is supported for FTP/HTTP. Offset is unchanged.\n");
return -1;
}
fp->offset = off;
fp->is_ready = 0;
return 0;
}
return -1;
}
int knet_close(knetFile *fp)
{
if (fp == 0) return 0;
if (fp->ctrl_fd != -1) netclose(fp->ctrl_fd); // FTP specific
if (fp->fd != -1) {
/* On Linux/Mac, netclose() is an alias of close(), but on
* Windows, it is an alias of closesocket(). */
if (fp->type == KNF_TYPE_LOCAL) close(fp->fd);
else netclose(fp->fd);
}
free(fp->host); free(fp->port);
free(fp->response); free(fp->retr); // FTP specific
free(fp->path); free(fp->http_host); // HTTP specific
free(fp);
return 0;
}
Another option is to use ftpfs/httpfs for FUSE, but I suppose that httpfs wouldn’t always work (e.g. if you had a non-directory-listing webserver), whereas in your solution you can directly open a file without needing to see the containing directory.
Yes, I should mention FUSE in the article. FUSE indeed achieves the goal and in fact it is more extensible at present. For example, my code only support FTP/HTTP, but sshfs has been there for years.
However, FUSE has more issues:
a) FUSE has to be installed by root/administrator, while additional installation will push some users away.
b) Mounting FTP/HTTP each time we open a remote file is cumbersome and sometimes troublesome (e.g. opening multiple remote files from a CGI).
c) FUSE may have cross-platform issues. It seems to me that FUSE on Windows is not matured (if exists). How about FUSE on solaris/AIX?
d) Someone told me that httpfs does not support random access. I have not tried, though.
getaddrinfo() is in winsock, since win2k (pro):
http://msdn.microsoft.com/en-us/library/ms738520(VS.85).aspx
[…] I implemented knetfile for accessing remote files on ftp and http as if they are local (see also this blog post). I have been using the implementation for a while and the end users like the feature. However, […]