Writing GLR merge functions

The merge functions used by a GLR parser to decide which of a pair of parse stacks should survive, are written inline in the grammar input file itself, causing them to be placed in the autogenerated class that inherits from the application-specific parser class.

A merge function is written rather like an inline action code block after a grammar rule reducing to the left-hand side non-terminal token, except the merge rule contains the single keyword merge. Only one merge statement can be written for each token that appears to the left hand side of a rule. Also note that no vertical bar operator is needed to separate the merge rule from the preceding grammar rule.

There are two parameters to the merge code, referred to as $0 and $1 in the merge code block. Unlike the token values in action code blocks, which are of type object, the two arguments in a merge code block are of type NonTerminalToken, and each therefore contain the complete parse tree for previously parsed tokens that reduced to the non-terminal to the left of the grammar rule. The merge code must use the information held in these parse trees, together with other context available through the parser class to choose which reduction of the two, if any, should be dropped.

If one or other of the two parses should be considered to be the surviving parse, it should be assigned to the result variable, $$. If both parses should continue to survive, maybe for one of them to be eliminated on a later reduction in a different merge function, either make no assignment to $$, or assign the value null to it.

The example code below shows how the inline merge function code for the GLR calculator example could be written. The remainder of the grammar for this example was described in the introduction to GLR merging. Details of this full working example will be found here.

    expression:
        leafExpression
        {	$$ = $0;	}
    |   expression POWER expression
        {	$$ = Math.Pow((double)$0, (double)$2);	}
    |   expression TIMES expression
        {   $$ = (double)$0 * (double)$2;  }

    |   expression DIVIDE expression
        {	$$ = (double)$0 / (double)$2;  }
    |   expression PLUS expression
        {   $$ = (double)$0 + (double)$2;  }

    |   expression MINUS expression
        {   $$ = (double)$0 - (double)$2;  }

    merge
        {
            // Merge only needs to deal with situations
            // where we have expression OP expression
            // in which one of the children is also an
            // expression OP expression. In this case,
            // the job of the GLR merge is to choose
            // the interpretation that gives correct
            // operator precedence.

            if(IsExprOpExpr($0))
            {
                NonTerminalToken right = $0.Children[2] as NonTerminalToken;
                NonTerminalToken left = $0.Children[0] as NonTerminalToken;

                if(IsExprOpExpr(right))
                {
                    // Example where swap must take place:
                    // $0
                    //  |
                    // Expr TIMES Expr
                    //             |
                    //            Expr PLUS Expr
                    //
                    // Which should be replaced by the other parse of the
                    // same sequence of input tokens which will look like:
                    // $1
                    //  |
                    // Expr PLUS Expr
                    //  |
                    // Expr TIMES Expr

                    if(HigherPrecedence($0.Children[1], right.Children[1]))
                        $$ = $1;
                    else
                        $$ = $0;
                }
                else if(IsExprOpExpr(left))
                {
                    // Example where the following swap must take place:
                    // $0
                    //  |
                    // Expr TIMES Expr
                    //  |
                    // Expr PLUS Expr
                    //
                    // Which should be replaced by the other parse of the
                    // same sequence of input tokens which will look like:
                    // $1
                    //  |
                    // Expr PLUS Expr
                    //            |
                    //           Expr TIMES Expr

                    if(HigherPrecedence($0.Children[1], left.Children[1]))
                        $$ = $1;
                    else
                        $$ = $0;
                }
            }
        }
    ;

Note that there is one merge rule for the expression token. The merge function above chooses which parse will survive for a token sequence expr1 OP1 expr2 OP2 expr3 based on whether OP1 or OP2 has higher precedence. The HigherPrecedence and IsExprOpExpr functions are functions the developer has added to the parser class, and hence are directly accessible to the inline code in the grammar.

From the above example, it is clear that the arguments $0 and $1, together with the result variable $$ have an indexable property named Children. These items are of the type NonTerminalToken which is described next.