By just writing a grammar and processing it to become a parser, we have a program that can track a sequence of input tokens and make sure that these tokens correspond to the sequencing rules we have defined. As soon as an input token is encountered that does not match any rule at that point in the grammar, the parsing engine will flag an error and will either terminate the parse or attempt to throw away rules and tokens to resynchronise to the input sequence. This latter error-handling behaviour is dependent on how you configure the grammar.
Typically we don't want to just 'follow the input tokens' to check they obey the grammar rules. We want to take strategic actions when we recognise particular constructs in the input sequence. The usual way in which this is done is to jump out and execute action functions whenever a rule has been completely recognised and is therefore being reduced.
The grammar description syntax allows for this. We can stipulate after any rule a block of C# source code to be executed when that rule reduces (recognises its complete list of right-hand tokens). This action code is placed in curly braces immediately beneath its rule in the grammar.
Within the block of code beneath a rule, we can use dollar identifiers to refer to
the values associated with each token in the reducing grammar rule. The following positional
rules apply. The value associated with the first token on the right hand side of
the rule is given the name $0
, while the (N+1)th token is given the name
$N
. The special identifier $$
refers to the value to be
returned for the reduced non-terminal token. Hence it is usual to assign a value to
$$
before the end of the action code block.
The code below is an example of action handler code that might be used in a
sentence analysis program. Notice the use of numbered dollar arguments to access
the values returned for each token in the rule being reduced, and also notice the
use of $$
to set the value of the rule's left hand side token (adjectives)
ready for when that token is later a token on the right side of a parent rule in which it
too is later reduced.
adjectives:
adjectives ADJECTIVE
{
// Append adjective to list of all adjectives
if (AdjectiveList == null)
AdjectiveList = new List<string>();
AdjectiveList.Add($1.ToString());
$$ = AdjectiveList.Count;
// Track how many adjectives have been seen
AdjectivesCount++;
}
|
;
This inline style for writing action function code is used for both inline and offline parsers. An arguable disadvantage of the style above is that the automatic syntax checking and intellisense provided by Visual Studio and other compiler tools will not know how to interpret the code in a grammar input file. Hence it is good practice to place as little code as possible within an action code block, and make it call methods back in the parser class that you have written. These methods do the real work of handling a the consequences of a rule reduction. Note that the code you write in action functions will be placed in a derived class from your parser class. Hence they can only access public, protected or internal members of your parser class. Private members will not be accessible from inline action code.
Sometimes in our action functions we would like to get access to the
Position
property of each IToken
that
appears in the current rule being reduced. The $N
parameters only give us access to the Value
property.
The Parser
base class of the class containing your action function
contains a property called TokenPosition
that is an array of strings.
This property fills in each string with the value of the Position
property from each of the right hand side tokens for the rule being reduced. Hence
the indexes of each token position string are the same as the index number for
each $N
parameter. Note that on rule reduction, the Position
parameter for the non-terminal token we are reducing the rule to is set to the value
of the first token in the right hand side of the rule being reduced. If the right hand
side is an empty rule, the position will be set to null. Note also that the
TokenPosition
property is only implemented in the LR(1) parser. Generalised
parsers (GLR) have a mechanism that builds a tree of recognised IToken
objects to represent the parser output. Hence the Position
properties
are directly available.
Here is some sample code demonstrating access to the position information in a rule reduction:
adjectives:
adjectives ADJECTIVE
{
// Append adjective to list of all adjectives
if (AdjectiveList == null)
AdjectiveList = new List<string>();
AdjectiveList.Add($1.ToString());
Console.WriteLine
("Position of ADJECTIVE ($1) is: {0}", TokenPosition[1]);
$$ = AdjectiveList.Count;
// Track how many adjectives have been seen
AdjectivesCount++;
}
|
;