Parikh's theorem


Parikh's theorem in theoretical computer science says that if one looks only at the number of occurrences of each terminal symbol in a context-free language, without regard to their order, then the language is indistinguishable from a regular language. It is useful for deciding that strings with a given number of terminals are not accepted by a context-free grammar. It was first proved by Rohit Parikh in 1961 and republished in 1966.

Definitions and formal statement

Let be an alphabet. The Parikh vector of a word is defined as the function, given by
where denotes the number of occurrences of the letter in the word.
A subset of is said to be linear if it is of the form
for some vectors.
A subset of is said to be semi-linear if it is a union of finitely many linear subsets.
Statement 1: Let be a context-free language.
Let be the set of Parikh vectors of words in, that is,. Then is a semi-linear set.
Two languages are said to be commutatively equivalent if they have the same set of Parikh vectors.
Statement 2: If is any semi-linear set, the language of words whose Parikh vectors are in is commutatively equivalent to some regular language. Thus, every context-free language is commutatively equivalent to some regular language.
These two equivalent statements can be summed up by saying that the image under of context-free languages and of regular languages is the same, and it is equal to the set of semilinear sets.

Strengthening for bounded languages

A language is bounded if for some fixed words.
Ginsburg and Spanier
gave a necessary and sufficient condition, similar to Parikh's theorem, for bounded languages.
Call a linear set stratified, if in its definition for each the vector has the property that it has at most two non-zero coordinates, and for each if each of the vectors has two non-zero coordinates, and, respectively, then their order is not.
A semi-linear set is stratified if it is a union of finitely many stratified linear subsets.
The Ginsburg-Spanier theorem says that a bounded language is context-free if and only if is a stratified semi-linear set.

Significance

The theorem has multiple interpretations. It shows that a context-free language over a singleton alphabet must be a regular language and that some context-free languages can only have ambiguous grammars. Such languages are called inherently ambiguous languages. From a formal grammar perspective, this means that some ambiguous context-free grammars cannot be converted to equivalent unambiguous context-free grammars.