The merge functions used by a GLR parser to decide which of a pair of parse stacks should survive, are written inline in the grammar input file itself, causing them to be placed in the autogenerated class that inherits from the application-specific parser class.
A merge function is written rather like an inline action code block
after a grammar rule reducing to the left-hand side non-terminal token, except the merge
rule
contains the single keyword merge
. Only one merge statement can be
written for each token that appears to the left hand side of a rule. Also note
that no vertical bar operator is needed to separate the merge rule from the
preceding grammar rule.
There are two parameters to the merge code, referred to as $0
and $1
in the merge code block. Unlike the token values in action code blocks,
which are of type object
, the two arguments in a merge
code block are of type
NonTerminalToken
, and each therefore contain the complete parse tree for previously
parsed tokens that reduced to the non-terminal to the left of the grammar rule.
The merge code must use the information held in these parse trees, together with
other context available through the parser class to choose which reduction of
the two, if any, should be dropped.
If one or other of the two parses should be considered to be the surviving
parse, it should be assigned to the result variable, $$
. If both
parses should continue to survive, maybe for one of them to be eliminated on a
later reduction in a different merge function, either make no assignment to
$$
, or assign the value null
to it.
The example code below shows how the inline merge function code for the GLR calculator example could be written. The remainder of the grammar for this example was described in the introduction to GLR merging. Details of this full working example will be found here.
expression:
leafExpression
{ $$ = $0; }
| expression POWER expression
{ $$ = Math.Pow((double)$0, (double)$2); }
| expression TIMES expression
{ $$ = (double)$0 * (double)$2; }
| expression DIVIDE expression
{ $$ = (double)$0 / (double)$2; }
| expression PLUS expression
{ $$ = (double)$0 + (double)$2; }
| expression MINUS expression
{ $$ = (double)$0 - (double)$2; }
merge
{
// Merge only needs to deal with situations
// where we have expression OP expression
// in which one of the children is also an
// expression OP expression. In this case,
// the job of the GLR merge is to choose
// the interpretation that gives correct
// operator precedence.
if(IsExprOpExpr($0))
{
NonTerminalToken right = $0.Children[2] as NonTerminalToken;
NonTerminalToken left = $0.Children[0] as NonTerminalToken;
if(IsExprOpExpr(right))
{
// Example where swap must take place:
// $0
// |
// Expr TIMES Expr
// |
// Expr PLUS Expr
//
// Which should be replaced by the other parse of the
// same sequence of input tokens which will look like:
// $1
// |
// Expr PLUS Expr
// |
// Expr TIMES Expr
if(HigherPrecedence($0.Children[1], right.Children[1]))
$$ = $1;
else
$$ = $0;
}
else if(IsExprOpExpr(left))
{
// Example where the following swap must take place:
// $0
// |
// Expr TIMES Expr
// |
// Expr PLUS Expr
//
// Which should be replaced by the other parse of the
// same sequence of input tokens which will look like:
// $1
// |
// Expr PLUS Expr
// |
// Expr TIMES Expr
if(HigherPrecedence($0.Children[1], left.Children[1]))
$$ = $1;
else
$$ = $0;
}
}
}
;
Note that there is one merge
rule for the expression
token.
The merge function above chooses which parse will survive for a token sequence
expr1 OP1 expr2 OP2 expr3
based on whether OP1
or
OP2
has higher precedence. The HigherPrecedence
and
IsExprOpExpr
functions are functions the developer has added to the
parser class, and hence are directly accessible to the inline code in the
grammar.
From the above example, it is clear that the arguments $0
and
$1
, together with the result variable $$
have an
indexable property named Children
. These items are of the type
NonTerminalToken
which is described next.