Shortest common supersequence problem


In computer science, the shortest common supersequence of two sequences X and Y is the shortest sequence which has X and Y as subsequences. This is a problem closely related to the longest common subsequence problem. Given two sequences X = < x1,...,xm > and Y = < y1,...,yn >, a sequence U = < u1,...,uk > is a common supersequence of X and Y if items can be removed from U to produce X or Y.
A shortest common supersequence is a common supersequence of minimal length. In the shortest common supersequence problem, two sequences X and Y are given, and the task is to find a shortest possible common supersequence of these sequences. In general, an SCS is not unique.
For two input sequences, an SCS can be formed from a longest common subsequence easily. For example, the longest common subsequence of X and Y is Z. By inserting the non-LCS symbols into Z while preserving their original order, we obtain a shortest common supersequence U. In particular, the equation holds for any two input sequences.
There is no similar relationship between shortest common supersequences and longest common subsequences of three or more input sequences. However, both problems can be solved in time using dynamic programming, where is the number of sequences, and is their maximum length. For the general case of an arbitrary number of input sequences, the problem is NP-hard.

Shortest common superstring

The closely related problem of finding a minimum-length string which is a superstring of a finite set of strings = is also NP-hard. Also, good approximations have been found for the average case but not for the worst case. However, it can be formulated as an instance of weighted set cover in such a way that the weight of the optimal solution to the set cover instance is less than twice the length of the shortest superstring. One can then use the O)-approximation for weighted set-cover to obtain an O)-approximation for the shortest superstring.
For any string in this alphabet, define to be the set of all strings which are substrings of. The instance of set cover is formulated as follows:
The instance can then be solved using an algorithm for weighted set cover, and the algorithm can output an arbitrary concatenation of the strings for which the weighted set cover algorithm outputs.

Example

Consider the set =, which becomes the universe of the weighted set cover instance. In this case, =. Then the set of subsets of the universe is
which have costs 3, 3, 3, 5, and 4, respectively.