Definition of Pairwise Sequence Search (PSS)
Roughly speaking, given two sets of sequences A and B, the pairwise sequence search (PSS) problem is to find the high-scoring matches between each pair of sequences, one from each set. PSS can, but not necessaily, be done by performing pairwise alignment between each pair of sequences and then retaining high-scoring matches only. Better algorithms usually index one set of sequences to achieve better time complexity.
Online vs. Offline Search
Sequence search can be online or offline. In online sequence search, all the sequences in alignment are not preprocessed; in offline search, one set of sequences can be indexed first and stored in a on-disk file, called database, and the other set of sequences, called queries, are aligned against the index. For online search, the time complexity is at least linear to the sum of lengths of the larger set. As the indexing step is separated out from matching, the time complexity of offline search can be better especially when lengths of the two sets are highly unbalanced.
Sometimes there is no clear edge between online and offline search. Many online search algorithms are also involved with an indexing step where one or both sets of sequences are indexed. Separating indexing and searching will make an online algorithm become an offline one. In practice, we usually prefer separating the indexing step if one set of sequences is static, and if one set of sequences is much larger than the other or indexing is very expensive.
Algorithms for Indexing
Most of fast searching algorithms involve indexing and algorithms used for indexing largely determine algorithms used for searching.
There are three major categories of algorithms for indexing: sorting, filtering and suffix tree based indexing. There are not clear edges between them. For example, sorting usually comes with filtering, suffix tree can be used to compute filters, and sorting can be used to construct suffix array.
(I will explain the three categories in future posts.)
Leave a Reply