In this section we shall describe the syntax for the input grammar files that are to be converted into a parser finite state machine for recognising input token sequences that conform to the grammar. As the grammar syntax is fairly extensive, it is described over several sections.
The basic model is that you write a formal grammar specification and store it to a file with a '.g' extension to its name. The parselr command-line program is then run with your grammar file as an argument. This causes a new file filled containing C# source code to be generated. That C# source file, along with a separately-written C# source file containing an input tokeniser, is added to the project in Visual Studio or other C# development environment you are using. References to the Parsing.dll and ParserGenerator.dll libraries are added to the project, and the project compiled.
The process of constructing an input tokeniser is described elsewhere. Here we focus on the contents of the input grammar file.
The input grammar file contains between two and four discrete sections. These are in order:
options
section. This contains a list
of parser options that are either used by the parser itself to modify its
behaviour, or as flags to pass on to the compiled output code. This section
can be omitted if you are happy to just accept a set of defaults, or if you
are generating an in-line parser at runtime.events
or tokens
section.
This is a mandatory section of the input grammar and
contains the list of different input token types that may appear when
retrieved from the input tokeniser. For example, if you have written a
parser that reads a C# program, examples of your input tokens might be a
token representing the keyword public
in the
input stream, or a token representing the operator '+=
'
in the input stream.guards
or
conditions
section. This section is optional. Regular grammars do not
impose guard conditions on tokens, so this is often omitted.grammar
section. This section
contains all the grammar rules, and is usually by far the most complicated
section of the grammar to write. The full syntax for how to write this is
given elsewhere in this set of documentation. The section is mandatory.All sections begin with a keyword as given in the list of section
descriptions above. Some section types have two keywords that are aliases for
each other, namely events
or
tokens
, and guards
or conditions
. The grammar section takes an
argument to the grammar keyword in parentheses where the argument is the name of
the top level rule name in the grammar that must have been parsed for the
grammar to be recognised as complete. All sections have bodies enclosed in
curly braces.
options
{
... options go here ...
}
events
{
... events or tokens go here ...
}
guards
{
... guard condition function names here ...
}
grammar(rootSymbol)
{
... grammar description goes here ...
}