Executing parser actions in grammar rules

By just writing a grammar and processing it to become a parser, we have a program that can track a sequence of input tokens and make sure that these tokens correspond to the sequencing rules we have defined. As soon as an input token is encountered that does not match any rule at that point in the grammar, the parsing engine will flag an error and will either terminate the parse or attempt to throw away rules and tokens to resynchronise to the input sequence. This latter error-handling behaviour is dependent on how you configure the grammar.

Typically we don't want to just 'follow the input tokens' to check they obey the grammar rules. We want to take strategic actions when we recognise particular constructs in the input sequence. The usual way in which this is done is to jump out and execute action functions whenever a rule has been completely recognised and is therefore being reduced.

The grammar description syntax allows for this. We can stipulate after any rule a block of C# source code to be executed when that rule reduces (recognises its complete list of right-hand tokens). This action code is placed in curly braces immediately beneath its rule in the grammar.

Within the block of code beneath a rule, we can use dollar identifiers to refer to the values associated with each token in the reducing grammar rule. The following positional rules apply. The value associated with the first token on the right hand side of the rule is given the name $0, while the (N+1)th token is given the name $N. The special identifier $$ refers to the value to be returned for the reduced non-terminal token. Hence it is usual to assign a value to $$ before the end of the action code block.

The code below is an example of action handler code that might be used in a sentence analysis program. Notice the use of numbered dollar arguments to access the values returned for each token in the rule being reduced, and also notice the use of $$ to set the value of the rule's left hand side token (adjectives) ready for when that token is later a token on the right side of a parent rule in which it too is later reduced.


    adjectives:
        adjectives ADJECTIVE
        {
            // Append adjective to list of all adjectives
            
            if (AdjectiveList == null)
                AdjectiveList = new List<string>();
            AdjectiveList.Add($1.ToString());
            $$ = AdjectiveList.Count;
            
            // Track how many adjectives have been seen
            
            AdjectivesCount++;
        }
    |
    ;

This inline style for writing action function code is used for both inline and offline parsers. An arguable disadvantage of the style above is that the automatic syntax checking and intellisense provided by Visual Studio and other compiler tools will not know how to interpret the code in a grammar input file. Hence it is good practice to place as little code as possible within an action code block, and make it call methods back in the parser class that you have written. These methods do the real work of handling a the consequences of a rule reduction. Note that the code you write in action functions will be placed in a derived class from your parser class. Hence they can only access public, protected or internal members of your parser class. Private members will not be accessible from inline action code.

Sometimes in our action functions we would like to get access to the Position property of each IToken that appears in the current rule being reduced. The $N parameters only give us access to the Value property.

The Parser base class of the class containing your action function contains a property called TokenPosition that is an array of strings. This property fills in each string with the value of the Position property from each of the right hand side tokens for the rule being reduced. Hence the indexes of each token position string are the same as the index number for each $N parameter. Note that on rule reduction, the Position parameter for the non-terminal token we are reducing the rule to is set to the value of the first token in the right hand side of the rule being reduced. If the right hand side is an empty rule, the position will be set to null. Note also that the TokenPosition property is only implemented in the LR(1) parser. Generalised parsers (GLR) have a mechanism that builds a tree of recognised IToken objects to represent the parser output. Hence the Position properties are directly available.

Here is some sample code demonstrating access to the position information in a rule reduction:


    adjectives:
        adjectives ADJECTIVE
        {
            // Append adjective to list of all adjectives
            
            if (AdjectiveList == null)
                AdjectiveList = new List<string>();
            AdjectiveList.Add($1.ToString());
            Console.WriteLine
                ("Position of ADJECTIVE ($1) is: {0}", TokenPosition[1]);
            $$ = AdjectiveList.Count;
            
            // Track how many adjectives have been seen
            
            AdjectivesCount++;
        }
    |
    ;