Hirschberg's algorithm is a generally applicable algorithm for optimal sequence alignment. BLAST and FASTA are suboptimal heuristics. If x and y are strings, where length = n and length = m, the Needleman-Wunsch algorithm finds an optimal alignment in O time, using O space. Hirschberg's algorithm is a clever modification of the Needleman-Wunsch Algorithm which still takes O time, but needs only O space and is much faster in practice. One application of the algorithm is finding sequence alignments of DNA or proteinsequences. It is also a space-efficient way to calculate the longest common subsequence between two sets of data such as with the common diff tool. The Hirschberg algorithm can be derived from the Needleman-Wunsch algorithm by observing that:
one can compute the optimal alignment score by only storing the current and previous row of the Needleman-Wunsch score matrix;
if is the optimal alignment of, and is an arbitrary partition of, there exists a partition of such that.
denotes the i-th character of, where. denotes a substring of size, ranging from i-th to the j-th character of. is the reversed version of. and are sequences to be aligned. Let be a character from, and be a character from. We assume that, and are well defined integer-valued functions. These functions represent the cost of deleting, inserting, and replacing with, respectively. We define, which returns the last line of the Needleman-Wunsch score matrix : function NWScore Score = 0 // 2 * array for j=1 to length Score = Score + Ins for i=1 to length // Init array Score = Score + Del for j=1 to length scoreSub = Score + Sub scoreDel = Score + Del scoreIns = Score + Ins Score = max end // Copy Score to Score Score = Score end for j=0 to length LastLine = Score return LastLine Note that at any point, only requires the two most recent rows of the score matrix. Thus, is implemented in space The Hirschberg algorithm follows: function Hirschberg Z = "" W = "" if length 0 for i=1 to length Z = Z + '-' W = W + Yi end else if length 0 for i=1 to length Z = Z + Xi W = W + '-' end else if length 1 or length 1 = NeedlemanWunsch else xlen = length xmid = length / 2 ylen = length
= Hirschberg + Hirschberg end return In the context of Observation, assume that is a partition of. Index is computed such that and.
Example
Let The optimal alignment is given by W = AGTACGCA Z = --TATGC- Indeed, this can be verified by backtracking its corresponding Needleman-Wunsch matrix: T A T G C 0 -2 -4 -6 -8 -10 A-2 -1 0 -2 -4 -6 G-4 -3 -2 -1 0 -2 T -6 -2 -4 0 -2 -1 A -8 -4 0 -2 -1 -3 C -10 -6 -2 -1 -3 1 G -12 -8 -4 -3 1 -1 C -14 -10 -6 -5 -1 3 A -16 -12 -8 -7 -3 1 One starts with the top level call to, which splits the first argument in half:. The call to produces the following matrix: T A T G C 0 -2 -4 -6 -8 -10 A -2 -1 0 -2 -4 -6 G -4 -3 -2 -1 0 -2 T -6 -2 -4 0 -2 -1 A -8 -4 0 -2 -1 -3 Likewise, generates the following matrix: C G T A T 0 -2 -4 -6 -8 -10 A -2 -1 -3 -5 -4 -6 C -4 0 -2 -4 -6 -5 G -6 -2 2 0 -2 -4 C -8 -4 0 1 -1 -3 Their last lines and sum of those are respectively ScoreL = rev = Sum = The maximum appears at ymid = 2, producing the partition. The entire Hirschberg recursion produces the following tree:
/ \
/ \ / \
/ \ / \
The leaves of the tree contain the optimal alignment.