Popular multialignment programs include: clustalw, T-coffee, dialign [PMID: 18505568, 10222408], muscle [PMID: 15318951], MAFFT [PMID: 18372315], probcons [PMID: 15687296] and probalign [PMID: 16954142]. Which is the best in practice? You can find various benchmarks in the papers listed above. I only summarize based on my understanding. It is recommended to read their papers in case my summary is biased. Also, both accuracy and speed may vary a lot with different data sets. I can only give results based on the data used in their papers.
Firstly, these software can be divided into two groups: those performing global multiple alignment and those performing local multiple alignment. In global alignment, each residue of each sequence is required to be aligned. This category includes clustalw, T-coffee, muscle, probcons and probalign. In local alignment, some residues are allowed to be unaligned. Only segments that closely related are aligned together. This category includes dialign family and MAFFT. They can also perform global alignment.
Most benchmarks are designed for global aligners. The consensus seems to be:
- accuracy: probalign>probcons>MAFFT>muscle>T-coffee>>clustalw~dialign
- speed: clustalw~muscle>MAFFT>dialign>>probcons>probalign~T-coffee
Only MAFFT and dialign performs local alignment. The dialign paper claims that dialign is much more accurate. However, I doubt this largely depends on the definition of high-scoring segments. We can always find tighter alignment by excluding more residues.
Then, what is the best? Muscle or MAFFT. Although probcons and probalign are more accurate, they run impractically slower than muscle and MAFFT. As for the winner between muscle and MAFFT, I cannot decide. I also need to evaluate the memory consumption, usability and stability. I have only used muscle. It is really nice software!
Look here :
http://bib.oxfordjournals.org/cgi/content/short/10/1/11
Choosing a best multiple alignment program is not a simple task, everything depends on the sequence information, all is about knowledge integration to choose the best aligner in “a priori” manner.
Cheers
Rad
That paper has done a great job in theory (so worth the publication), but I think it is overcomplicated for practical applications. One of the evidence is it has not been cited since its publication (Oct 2009). For large genome databases, it is hard to collect secondary structures or other information. In addition, these databases try to avoid bad distant species; actually whether using the best program is not that important. Speed and memory and user friendliness is more important. On the other hand, for individual researchers, they only look at a few cases, doing manual realignment is affordable, much easier for them than setting up this whole complicated software.