Feeds:
Posts
Comments

Posts Tagged ‘software’

GNU sort is one of my favorite program. It is fast and highly flexible. However, when I try to sort chromosome names, it becomes a pain. In bioinformatics, chromosomes are usually named as chr1, chr2, …, chr10, chr11, … chr20, …, chrX and chrY. It seems to me that there is no way to sort these names in the above order. Finally, I decide to modify GNU sort. I separate sort source codes from textutils-1.22 because this version is less dependent on other packages.

The string comparison function is:

static int mixed_numcompare(const char *a, const char *b)
{
  char *pa, *pb;
  pa = (char*)a; pb = (char*)b;
  while (*pa && *pb) {
    if (isdigit(*pa) && isdigit(*pb)) {
      long ai, bi;
      ai = strtol(pa, &pa, 10);
      bi = strtol(pb, &pb, 10);
      if (ai != bi) return ai<bi? -1 : ai>bi? 1 : 0;
    } else {
      if (*pa != *pb) break;
      ++pa; ++pb;
    }
  }
  if (*pa == *pb)
  return (pa-a) < (pb-b)? -1 : (pa-a) > (pb-b)? 1 : 0;
  return *pa<*pb? -1 : *pa>*pb? 1 : 0;
}

It does numerical comparison for digits and string comparison for other characters. With this comparison, chromosome names can be sorted in the desired way. I add a new command line option -N (or -k1,1N) to trigger string-digits mixed comparison.

In addition, I also replace the top-down recursive mergesort with a bottom-up iterative sort, and use heap to accelerate merging. The improved sort is a little faster than the orginal version.

The improved sort can be downloaded here, distributed under GPL.

Read Full Post »

Converting Source Codes to HTML

If you use Emacs, you can convert everything you see on the screen to an HTML file with almost exactly the same look and style. The package you need is htmlize.el. Some Emacs distributions, such as Carbon Emacs, package this module by default. This page shows an example. Really fantastic! It goes far beyond any similar methods I am aware of so far.

Read Full Post »