Input grammar syntax

In this section we shall describe the syntax for the input grammar files that are to be converted into a parser finite state machine for recognising input token sequences that conform to the grammar. As the grammar syntax is fairly extensive, it is described over several sections.

The basic model is that you write a formal grammar specification and store it to a file with a '.g' extension to its name. The parselr command-line program is then run with your grammar file as an argument. This causes a new file filled containing C# source code to be generated. That C# source file, along with a separately-written C# source file containing an input tokeniser, is added to the project in Visual Studio or other C# development environment you are using. References to the Parsing.dll and ParserGenerator.dll libraries are added to the project, and the project compiled.

The process of constructing an input tokeniser is described elsewhere. Here we focus on the contents of the input grammar file.

Structure of input grammar file

The input grammar file contains between two and four discrete sections. These are in order:

The options section. This contains a list of parser options that are either used by the parser itself to modify its behaviour, or as flags to pass on to the compiled output code. This section can be omitted if you are happy to just accept a set of defaults, or if you are generating an in-line parser at runtime.
The events or tokens section. This is a mandatory section of the input grammar and contains the list of different input token types that may appear when retrieved from the input tokeniser. For example, if you have written a parser that reads a C# program, examples of your input tokens might be a token representing the keyword public in the input stream, or a token representing the operator '+=' in the input stream.
The guards or conditions section. This section is optional. Regular grammars do not impose guard conditions on tokens, so this is often omitted.
The grammar section. This section contains all the grammar rules, and is usually by far the most complicated section of the grammar to write. The full syntax for how to write this is given elsewhere in this set of documentation. The section is mandatory.

All sections begin with a keyword as given in the list of section descriptions above. Some section types have two keywords that are aliases for each other, namely events or tokens, and guards or conditions. The grammar section takes an argument to the grammar keyword in parentheses where the argument is the name of the top level rule name in the grammar that must have been parsed for the grammar to be recognised as complete. All sections have bodies enclosed in curly braces.

Example


options
{
    ... options go here ...
}

events
{
    ... events or tokens go here ...
}

guards
{
    ... guard condition function names here ...
}

grammar(rootSymbol)
{
    ... grammar description goes here ...
}