Van Wijngaarden grammar
In computer science, a Van Wijngaarden grammar is a two-level grammar which provides a technique to define potentially infinite context-free grammars in a finite number of rules. The formalism was invented by Adriaan van Wijngaarden to define rigorously some syntactic restrictions which previously had to be formulated in natural language, despite their essentially syntactical content.
Typical applications are the treatment of gender and number in natural language syntax and the well-definedness of identifiers in programming languages. For example, "there is one person" and "there are two people" are both grammatically correct but "there are one person" is incorrect for context-sensitive reasons that a W-grammar could represent.
The technique was used and developed in the definition of the programming language ALGOL 68. It is an example of the larger class of affix grammars.
Overview
A W-grammar consists of a finite set of meta-rules, which are used to derive production rules from a finite set of -rules. Meta-rules are restricted to those defined by a context-free grammar. Hyper-rules restrict the admissible contexts at the upper level. Essentially, the consistent substitution used in the derivation process is equivalent to unification, as in Prolog, as was noted by Alain Colmerauer.For example, the assignment
x:=1
is only valid if the variable x can contain an integer. Therefore, the context-free syntax variable := value
is incomplete. In a two-level grammar, this might be specified in a context-sensitive manner as REF TYPE variable := TYPE value
. Then ref integer variable := integer value
could be a production rule but ref Boolean variable := integer value
is not a possible production rule. This also means that assigning with incompatible types becomes a syntax error which can be caught at compile-time. Similarly,
STYLE begin token, new LAYER1 preludes,
parallel token, new LAYER1 tasks PACK,
STYLE end token
allows
begin... end
and
but not begin...ALGOL 68 examples
Prior to ALGOL 68 the language ALGOL 60 was formalised using the context-free Backus–Naur form. The appearance of new context-sensitive two-level grammar presented a challenge to some readers of the 1968 ALGOL 68 "Final Report". Subsequently, the final report was revised by Wijngaarden and his colleagues and published as the 1973 ALGOL 68 "Revised Report".
The grammar for ALGOL 68 is officially defined in the two-level Van Wijngaarden grammar, but a subset has been done in the one-level Backus–Naur form, compare:
- Van Wijngaarden grammar;
- Backus–Naur form/Yacc
ALGOL 68 as in the 1968 Final Report §2.1
a) program : open symbol, standard prelude,
library prelude option, particular program, exit,
library postlude option, standard postlude, close symbol.
b) standard prelude : declaration prelude sequence.
c) library prelude : declaration prelude sequence.
d) particular program :
label sequence option, strong CLOSED void clause.
e) exit : go on symbol, letter e letter x letter i letter t, label symbol.
f) library postlude : statement interlude.
g) standard postlude : strong void clause trainALGOL 68 as in the 1973 Revised Report §2.2.1, §10.1.1
program : strong void new closed clause
A) EXTERNAL :: standard ; library ; system ; particular.
B) STOP :: label letter s letter t letter o letter p.
a) program text : STYLE begin token, new LAYER1 preludes,
parallel token, new LAYER1 tasks PACK,
STYLE end token.
b) NEST1 preludes : NEST1 standard prelude with DECS1,
NEST1 library prelude with DECSETY2,
NEST1 system prelude with DECSETY3, where is
.
c) NEST1 EXTERNAL prelude with DECSETY1 :
strong void NEST1 series with DECSETY1, go on token ;
where is, EMPTY.
d) NEST1 tasks : NEST1 system task list, and also token,
NEST1 user task PACK list.
e) NEST1 system task : strong void NEST1 unit.
f) NEST1 user task : NEST2 particular prelude with DECS,
NEST2 particular program PACK, go on token,
NEST2 particular postlude,
where is.
g) NEST2 particular program :
NEST2 new LABSETY3 joined label definition
of LABSETY3, strong void NEST2 new LABSETY3
ENCLOSED clause.
h) NEST joined label definition of LABSETY :
where is, EMPTY ;
where is,
NEST label definition of LAB1,
NEST joined label definition of$ LABSETY1.
i) NEST2 particular postlude :
strong void NEST2 series with STOP.Implementations
yo-yo parser for van Wijngaarden grammars with example grammars for expressions, eva, sal and Pascal.History
W-grammars are based on the idea of providing the nonterminal symbols of context-free grammars with attributes that pass information between the nodes of the parse tree, used to constrain the syntax and to specify the semantics. This idea was well known at the time; e.g. Donald Knuth visited the ALGOL 68 design committee while developing his own version of it, attribute grammars. Quite peculiar to W-grammars was their strict treatment of attributes as strings, defined by a context-free grammar, on which concatenation is the only possible operation; in attribute grammars, attributes can be of any data type, and any kind of operation can be applied to them.
After their introduction in the Algol 68 report, W-grammars were widely considered as too powerful and unconstrained to be practical. This was partly a consequence of the way in which they had been applied; the revised Algol 68 report contained a much more readable grammar, without modifying the W-grammar formalism itself.
Meanwhile, it became clear that W-grammars are indeed too powerful.
They describe precisely all recursively enumerable languages, which makes parsing impossible in general: it is an undecidable problem to decide whether a given string can be generated by a given W-grammar. Their use must be seriously constrained when used for automatic parsing or translation. Restricted and modified variants of W-grammars were developed to address this, e.g.
- Extended Affix Grammars, applied to describe the grammars of natural language such as English and Spanish)
- Q-systems, also applied to natural language processing
- the CDL series of languages, applied as compiler construction languages for programming languages
Applications outside of ALGOL 68
Anthony Fisher has written a parser for a large class of W-grammars.
Dick Grune created a C program that would generate all possible productions of a 2-level grammar.
The applications of EAGs mentioned above can effectively be regarded as applications of W-grammars, since EAGs are so close to W-grammars.
W-grammars have also been proposed for the description of complex human actions in ergonomics.
- A W-Grammar Description for Ada – "W-grammar description of Ada is still useful to computer scientists who need more than a simple understanding of the syntax and rudimentary description of the semantics"