Input grammar syntax

In this section we shall describe the syntax for the input grammar files that are to be converted into a parser finite state machine for recognising input token sequences that conform to the grammar. As the grammar syntax is fairly extensive, it is described over several sections.

The basic model is that you write a formal grammar specification and store it to a file with a '.g' extension to its name. The parselr command-line program is then run with your grammar file as an argument. This causes a new file filled containing C# source code to be generated. That C# source file, along with a separately-written C# source file containing an input tokeniser, is added to the project in Visual Studio or other C# development environment you are using. References to the Parsing.dll and ParserGenerator.dll libraries are added to the project, and the project compiled.

The process of constructing an input tokeniser is described elsewhere. Here we focus on the contents of the input grammar file.

Structure of input grammar file

The input grammar file contains between two and four discrete sections. These are in order:

All sections begin with a keyword as given in the list of section descriptions above. Some section types have two keywords that are aliases for each other, namely events or tokens, and guards or conditions. The grammar section takes an argument to the grammar keyword in parentheses where the argument is the name of the top level rule name in the grammar that must have been parsed for the grammar to be recognised as complete. All sections have bodies enclosed in curly braces.

Example


options
{
    ... options go here ...
}

events
{
    ... events or tokens go here ...
}

guards
{
    ... guard condition function names here ...
}

grammar(rootSymbol)
{
    ... grammar description goes here ...
}