Posts Tagged ‘multialignment’

Popular multialignment programs include: clustalw, T-coffee, dialign [PMID: 18505568, 10222408], muscle [PMID: 15318951], MAFFT [PMID: 18372315], probcons [PMID: 15687296] and probalign [PMID: 16954142]. Which is the best in practice? You can find various benchmarks in the papers listed above. I only summarize based on my understanding. It is recommended to read their papers in case my summary is biased. Also, both accuracy and speed may vary a lot with different data sets. I can only give results based on the data used in their papers.

Firstly, these software can be divided into two groups: those performing global multiple alignment and those performing local multiple alignment. In global alignment, each residue of each sequence is required to be aligned. This category includes clustalw, T-coffee, muscle, probcons and probalign. In local alignment, some residues are allowed to be unaligned. Only segments that closely related are aligned together. This category includes dialign family and MAFFT. They can also perform global alignment.

Most benchmarks are designed for global aligners. The consensus seems to be:

  • accuracy: probalign>probcons>MAFFT>muscle>T-coffee>>clustalw~dialign
  • speed: ¬†clustalw~muscle>MAFFT>dialign>>probcons>probalign~T-coffee

Only MAFFT and dialign performs local alignment. The dialign paper claims that dialign is much more accurate. However, I doubt this largely depends on the definition of high-scoring segments. We can always find tighter alignment by excluding more residues.

Then, what is the best? Muscle or MAFFT. Although probcons and probalign are more accurate, they run impractically slower than muscle and MAFFT. As for the winner between muscle and MAFFT, I cannot decide. I also need to evaluate the memory consumption, usability and stability. I have only used muscle. It is really nice software!

Read Full Post »