ParseLR - The GeneralisedParser class

Differences between LR(1) and GLR parsers

The parser tables used by LR(1) and GLR parsers are the same, except for two small differences. First, when constructing a set of parser tables for an LR(1) parser, the parser generator tries to identify places where shift-reduce and reduce-reduce conflicts occur, and will fail to produce a set of parser tables (and hence a parser) if these conflicts are encountered. This is not the case when constructing a GLR parser, as the GLR parsing algorithm merely clones the parser stacks when a conflict is encountered.

Second, there is a property in the IParser interface (implemented by both the Parser class and the GeneralisedParser class) called ParserType. This readonly string property is set to "LR1" for a regular lookahead parser, and to "GLR" for a generalised parser.

Writing offline GLR parsers

Using ParseLR to create source code for an offline GLR parser is virtually the same as for an LR(1) parser, which is described elsewhere. The primary difference is that the application-specific parser class should have GeneralisedParser as its base class rather than Parser. Since these two base classes both implement the IParser interface, and the methods and properties of that interface are all you need to use from your parser, there is very little you have to do when writing your parser class.

The only thing you will need to be aware of is that you may have to write merge action functions to resolve those places in the grammar where two separate valid parses merge on a reduction to the same non-terminal. You can find out how to write these merge functions here.

Writing inline GLR parsers

Writing inline GLR parsers similarly is very much like writing inline LR(1) parser classes. The main difference is that you must derive your application-specific parser class from the GeneralisedParser base class rather than the Parser base class. In addition, when you use the parser factory's ParserFactory<MyParser>.InitializeFromGrammar method, you must make sure the generalisedParser parameter is set to true. This will automatically ensure that the right kind of parser is created and used at runtime to handle the input token or event streams, and will turn off shift-reduce and reduce-reduce conflict rejection when building the parser tables. The ParserFactory<MyParser>.CreateInstance() method is then used to create each instance of a generalised parser.

Running a GLR parser

When a GLR parser is run on a sequence of input tokens or events, its behaviour is slightly different to the behaviour of an LR(1) parser. As it is designed to capture every valid parse of an ambiguous grammar, at the end of parsing you will be left with an array of parse trees, each element of which represents a valid interpretation of the input token sequence. These parse trees are found in a property of the GeneralisedParser class whose declaration looks like: IToken[] ParserResults; where each IToken is actually an instance of the NonterminalToken class, and is the root of a different valid parse tree. To understand how this tree is constructed beneath a NonterminalToken object, visit this page.

A second behavioural difference relates to the action code executed when each grammar rule is recognised and reduced. Action code is no longer executed as soon as a rule is reduced, because there may be other parse trees being composed at the same time, for which different rules might be recognised. Instead, the blocks of action code are executed in exactly the same order as they would have been in an LR(1) parser, but only when you retrieve the value of the ParserResults[N].Value property after the parse is over.

Lastly, note that a GLR parser is often used with merge action functions to resolve those places in the grammar where two separate valid parses merge on a reduction to the same non-terminal. The intention of these functions is to help the parser when there is ambiguity in the input grammar, by telling the parser which of the two interpretations to drop and which to keep. Hence, in many cases, the ParserResults array property in the GLRParser class should end up with a length of one, and the only valid resulting parse tree should be the one in ParserResults[0].