By default, parsers generated using ParseLR absorb input tokens or events as a stream of
objects each exposing the IToken
interface. For each successive input token,
there is a token or event type exposed through the Type
property, and held as
an integer. To translate between integer values and meaningful names for the token type,
a parser generated by parseLR exposes a TwoWayMap<string, int>
called
Tokens
. The parser factory for the parser class also exposes the same property
as a static property. This makes it easy to look up a token value from its name where
a reference to the parser object is not available.
Much of the power of a parser comes from processing additional input data that accompanies
each input token. The type of that data may be different for different values of a token's
type. Hence the IToken
interface exposes a Value
property that is
of type object
, so that it can carry any data type. it is left to the
programmer to choose the right cast for a given token type when writing the
action code
to be executed in the grammar. This usually results in casts appearing before occurrences
of the $N
parameters.
This style of coding is not type-safe as it relies on the developer knowing what to cast action parameters to when writing action code. It also makes the code cluttered, as operator precedence requires extra parentheses when accessing members of an object that has just been cast to its real type.
It is possible to add strong typing to the input token values in a grammar so that input
token values no longer require their $N
parameters to be cast, but already
have the correct type. it is also possible to strongly type the non-terminals that appear
to the left of grammar rules. This means that assignments to the $$
value
parameter in action code, representing the new value assigned to the non-terminal token,
are also type checked. Similarly when a non-terminal appears on the right side of a grammar
rule, its value as a $N
parameter in the action code is strongly typed.
In the tokens
or events
section of the grammar, we give names and
optional integer type values for each of the terminal tokens that maight be returned from
an input tokeniser. It is also possible to provide a data type for the Value property of
the corresponding ITokens that carry that particular token type. This is acheived by using
the same syntax as C# uses for establishing a generic type. The data type for the object
stored in the Value
property of the IToken appears after the name of the
terminal token, in less-than and greater-than braces. An example appears below:
tokens
{
INTEGER<int> = 1, // User-specified token type value of 1, & type for the token data
PLUS, // Will be allocated a type value beyond 16384
MINUS, // Absence of data type makes token data type default to 'object'
TIMES,
DIVIDE,
IDENTIFIER<string>, // User-specified data type, but token type beyond 16384
LPAREN,
RPAREN
}
To stipulate the data type for the Value
property of a non-terminal IToken
,
we place a data type in less-than and greater-than symbols to the right of the first rule definition
for that non-terminal, just before the colon in the rule definition. This ensures that the $N
parameter in any action code will automatically have the correct type when the non-terminal appears in
a rule that is being reduced. It also ensures that the type of any expression that assigns to $$
in action code is checked against the type of the non-terminal at the left of that rule. An example follows:
adjectives <List<string>> :
adjectives ADJECTIVE
{
// Assume that the terminal token ADJECTIVE was also strongly
// typed as a string in the tokens section of the grammar
$0.Add($1); // Append a string onto a List<string>
$$ = $0; // Ensure the list of strings is passed to the
// left hand side non-terminal on rule reduction
}
|
{
$$ = new List<string>();
}
;
The multiplicity symbols placed after a token cause the strong typing to be changed to support
the selected multiplicity. If a terminal or non-terminal token has type X
, and appears in
a multiplicity using a *
or a +
, then the type for that multiple element
when used as a $N
parameter will be IList<X>
. Similarly, if the
multiplicity symbol is the optional symbol ?
, the resulting type of the $N
parameter will be IOptional<X>
, this being the interface containing a boolean
HasValue
property and a Value
property of type X
. Consider
the following sample code and its comments indicating the types of the various rule elements:
...
tokens
{
...
ATERMINAL<TSomeType>,
...
}
grammar(rootNonterminal)
{
...
nonTerminal1<TNonterm1> : ... rules for nonTerminal1 ... ;
...
nonTerminal2 :
nonTerminal1* ATERMINAL+ nonTerminal1?
{
if($0.Count > 0) // $0 has type IList<TNonterm1>
foreach(TSomeType tst in $1) // $1 has type IList<TSomeType>
tst.SomeMemberFunction();
if($2.HasValue) // $2 has type IOptional<TNonterm1>
$2.Value.SomeOtherMemberFunction();
}
;
}
Interestingly there are no changes to the input tokeniser to support the strong typing. It still implements the
standard IToken
interface on each of the tokens it returns to the parser engine. Hence the Value
property is still filled in with data that is treated as if it were of type object
. It is the autogenerated
parser code created from the grammar that instills the strong typing on the $N
and $$
parameters
to the action code blocks. For more details on how to write input tokenisers, see
Writing an input tokeniser.