Feeds:
Posts
Comments

Posts Tagged ‘myprog’

GNU sort is one of my favorite program. It is fast and highly flexible. However, when I try to sort chromosome names, it becomes a pain. In bioinformatics, chromosomes are usually named as chr1, chr2, …, chr10, chr11, … chr20, …, chrX and chrY. It seems to me that there is no way to sort these names in the above order. Finally, I decide to modify GNU sort. I separate sort source codes from textutils-1.22 because this version is less dependent on other packages.

The string comparison function is:

static int mixed_numcompare(const char *a, const char *b)
{
  char *pa, *pb;
  pa = (char*)a; pb = (char*)b;
  while (*pa && *pb) {
    if (isdigit(*pa) && isdigit(*pb)) {
      long ai, bi;
      ai = strtol(pa, &pa, 10);
      bi = strtol(pb, &pb, 10);
      if (ai != bi) return ai<bi? -1 : ai>bi? 1 : 0;
    } else {
      if (*pa != *pb) break;
      ++pa; ++pb;
    }
  }
  if (*pa == *pb)
  return (pa-a) < (pb-b)? -1 : (pa-a) > (pb-b)? 1 : 0;
  return *pa<*pb? -1 : *pa>*pb? 1 : 0;
}

It does numerical comparison for digits and string comparison for other characters. With this comparison, chromosome names can be sorted in the desired way. I add a new command line option -N (or -k1,1N) to trigger string-digits mixed comparison.

In addition, I also replace the top-down recursive mergesort with a bottom-up iterative sort, and use heap to accelerate merging. The improved sort is a little faster than the orginal version.

The improved sort can be downloaded here, distributed under GPL.

Read Full Post »

C pointer is the most powerful and nasty concept. Whether marstering C points or not separate intermediate C programers from the elementary ones. Want to know whether you have mastered C pointers? Have a look at this program. If the basic idea is clear to you, you are qualified to be an intermediate programmer. If you have difficulty, you should learn hard from other programmers. It is unwise to study this program as a beginner. The catch in the program is too complicated.

This program is adapted from an example in the C bible “The C Programming Language” written by Kernighan & Ritchie. I commented it and extended its function a bit. This allocator is usually not as efficient as malloc family that comes with the system, but it is good enough for a lot of practical applications. Also, this allocator is simple and clear. You can largely predict its behaviour. In comparison, it is not always easy to understand malloc is doing.

I came to know this allocator from Phil Green’s Phrap assembler. I then found the book and reimplemented in my way.

Read Full Post »

« Newer Posts