The tokens
or events
section (the names are
synonyms) contains a definitive list of the recognised input tokens that the
parser consumes. This is the alphabet of tokens that the input tokeniser object
must provide. Arrival of any other value than those in this list will cause the
parser to report an error. Although the two keywords have exactly the same
effect, in practice you might use the word tokens
when constructing
a traditional parser that reads input tokens from a tokeniser, but might use the
keyword events
when using the grammar to describe a state machine.
In this latter case the tokeniser is really the class that provides the input
stream of input events that the state machine responds to.
Each token has a name and a value, however the value can be left out if you are happy for the parser generator to provide a value automatically. If providing a value of your own, it must be an integer value between zero and 16383 inclusive. The auto-generated token values lie outside this range, so it is possible to mix defined and auto-generated token values within the same grammar.
Once the parser has been created from your grammar, it is possible for your
tokeniser to look up the values that have been assigned to each token name.
Parser generation creates a dictionary-like property in the parser factory
whose name is Tokens
. If you want to look up the value
that was allocated to the token with name SQUIGGLE
, and your parser class is
named MyParser
, the following code will obtain the token value:
int squiggleTokenValue = ParserFactory<MyParser>.Tokens["SQUIGGLE"];
Note that the above process will be quite inefficient if called to look up token values repeatedly while scanning input. It might be better to look these up and cache them into named readonly integers, or perhaps assign values to the tokens in the grammar that match those in the tokeniser.
Entries in the tokens section of the grammar are separated by commas. There is no comma between the final entry and the closing curly brace. Each entry consists of the token name, and optionally an equals sign followed by an integer numeric value.
tokens
{
INTEGER = 1, // Has a user-specified token value
IDENTIFIER = 2, // Also has a user=specified value
PLUS, // Will be allocated a value beyond 16384
MINUS,
TIMES,
DIVIDE,
LPAREN,
RPAREN
}
Note that there are some reserved token names used internally by the parser generator,
which you should not use. These are currently: EOF
, SOF
,
ERR
, and _Start
.