For instance, the lexical syntax of many programming languages requires that tokens be built from the maximum possible number of characters from the input stream. This is done to resolve the problem of inherent ambiguity in commonly used regular expressions such as +. The term is also used in compilers in the instruction selection stage to describe a method of "tiling" — determining how a structured tree representing a program in an intermediate language should be converted into linear machine code. An entire subtree might be converted into just one machine instruction, and the problem is how to split the tree into non-overlapping "tiles," each representing one machine instruction. An effective strategy is simply to make a tile of the largest subtree possible at any given point, which is called "maximal munch."
Drawbacks
In some situations, "maximal munch" leads to undesirable or unintuitive outcomes. For instance, in the C programming language, the statementx=y/*z; will probably lead to a syntax error, since the /* character sequence initiates a comment that is either unterminated or terminated by the end token */ of some later, unrelated actual comment. What was actually meant in the statement was to assign to the variable x the result of dividing the value in y by the value obtained by dereferencing pointerz; this would be perfectly valid code. It can be stated by making use of whitespace, or using x=y/;. Another example, in C++, uses the "angle bracket" characters < and > in the syntax for template specialization, but two consecutive> characters are interpreted as the right-shift operator>>. Prior to C++11, the following code would produce a parse error, because the right-shift operator token is encountered instead of two right-angle-bracket tokens: std::vector> my_mat_11; //Incorrect in C++03, correct in C++11. std::vector > my_mat_03; //Correct in either C++03 or C++11.
The C++11 standard adopted in August 2011amended the grammar so that a right-shift token is accepted as synonymous with a pair of right-angle brackets, which complicates the grammar but allows the continued use of the maximal munch principle.
Alternatives
Programming languages researchers have also responded by replacing or supplementing the principle of maximal munch with other lexical disambiguation tactics. One approach is to utilize "follow restrictions," which instead of directly taking the longest match will put some restrictions on what characters can follow a valid match. For example, stipulating that strings matching + cannot be followed by an alphabetic character achieves the same effect as maximal munch with that regular expression. Another approach is to keep the principle of maximal munch but make it subordinate to some other principle, such as context.