Multiplicity on Grammar Tokens

Introduction

Rules in a grammar consist of a token to the left of a colon, followed by a list of tokens with optional guards applied to them on the right. Sometimes we wish to express that the same token can occur optionally, or a number of times. Typically in parser input syntax this has been done by writing several rules, some of them recursive. For example, if we wanted to write a rule in which the token optToken was optional, we might end up writing a set of rules as follows:

rule: precedingTokens optionalToken followingTokens ;

optionalToken :
    optToken
|
;

The action functions or inline action code blocks for the three rules would need to be structured so that the top rule received back, say, a null object from the third (empty) rule, and valid data from the second rule.

Similarly, if we wanted to implement a list of zero to many instances of a token, we would write a pair of child rules using recursion to implement the list:

rule: precedingTokens optionalTokenList followingTokens ;

optionalTokenList :
    optionalTokenList optToken
|
;

The recursive rule allows reductions to accept a list of zero or more consecutive optToken elements. The action functions or inline action code would typically arrange that $$ contains a list of the resulting values from each of the optToken elements. The list would then arrive in the parent rule as a list whose length and content could be interrogated in the parent rule's action code.

A similar construct can be used for a list of one to many instances of an element:

rule: precedingTokens tokenList followingTokens ;

tokenList :
    tokenList optToken
|	optToken
;

ParseLR grammars have an abbreviated construct that allows optional, optional lists, and non-empty lists of tokens to be directly represented in a grammar rule. Internally, the parser generator creates instances of the rules as shown above.

Multiplicity on Tokens

To represent differing multiplicity on tokens in a grammar rule, the tokens are followed by a multiplicity symbol. The symbol used is consistent with regular expressions, and with other parsers that have similar capability:

SYMBOL USED MULTIPLICITY
'?' Optional token (zero or one occurrences)
'*' Optional list of tokens (zero through many occurrences)
'+' Non-empty list of tokens (one through many occurrences)

For example, if we wanted to write a rule in which the first right hand side token was optional, the second was an optional list, and the third a non-empty list, we might write the rule in the form:

rule: first? second* third+ ;

Multiplicity Actions

Whenever a token is followed by a multiplicity symbol, the token's $N value as used in the action code on rule reduction is of type List<object>. The elements in this list are the values from each of the instances of the multiple tokens within the sequence. In particular, empty lists still return a List<object>, but with zero elements in it. Similarly the single instance in a '?' multiplicity is still returned as a single element within a list of objects. Your action code should be written accordingly.

An example of a rule accessing the individual elements in a list is shown below:

rule: 
    child*
    {
        List<object> children = $0 as List<object>;
        foreach(object o in children)
        	DoSomethingWithString((string)o);
    }
;

child:
    someTokens
    {
    	$$ = SomethingThatGetsAStringFrom($0);
    }
;

The extraction and casting to actual type of each of the arguments in a List<object> is tedious and error prone. There is therefore a generic method provided in the Parsing.Parser and in the Parsing.GeneralisedParser base classes with signature public IList<T> AsList<T>(object arg); that takes one of the token values as the parameter to its constructor, and interprets it as an IList<T> for you. Since the action code is either written inline in the grammar, or within the source code for a partial parser class, the Parsing namespace is typically already included by a using statement at the top of the source file. An example of its use is shown below:

rule: 
    child*
    {
        IList<string> children = AsList<string>($0);
        foreach(string s in children)
        	DoSomethingWithString(s);
    }
;

child:
    someTokens
    {
    	$$ = SomethingThatGetsAStringFrom($0);
    }
;

Similarly for optional tokens (those using the '?' as their multiplicity indicator) it is not intuitive to treat these as a collection using the AsList<T> helper method. There is an interface in the Parsing namespace called IOptional<T> that contains the boolean property HasValue and the generic property T Value. This allows the writer of action code in grammars to use a helper function AsOptional<T>($N) to return an instance of this interface. An example of its use appears below:

rule: 
    child?
    {
        IOptional<string> child = AsOptional<string>($0);
        if(child.HasValue)
        	DoSomethingWithString(child.Value);
    }
;

child:
    someTokens
    {
    	$$ = SomethingThatGetsAStringFrom($0);
    }
;

Multiplicity and Guard Conditions

A distinguishing feature of ParseLR is that each token, terminal or non-terminal, can have a guard condition associated with it when it appears to the right hand side of a grammar rule. In order for the token to be shifted, or for a rule to be reduced, the input token must match and the guard condition evaluated at that point in time must be true. Guard conditions can be used in conjunction with multiplicity symbols too.

What is interesting is that the multiplicity can be associated with one of two guard conditions. In a rule element with multiplicity applied, such as token* for example, a guard condition might need to be evaluated for each successive instance of token. A guard condition might also be needed that applies to the whole list once all its components have been recognised, and could even be the guard condition that determines when the list is finished. To capture this, guard conditions may be placed before and/or after the multiplicity symbol. Consider the following example:

rule: 
    rule : earlierTokens token[GuardOnEach]*[GuardOnWholeList] laterTokens ;

In this rule, GuardOnEach is tested for each instance of token for it to be recognised. GuardOnWholeList must be true for the end of the list. In practice, the parser generator expands this rule internally as follows:

rule: 
    rule : earlierTokens zeroToMany_token_GuardOnEach[GuardOnWholeList] laterTokens ;
    zeroToMany_token_GuardOnEach : zeroToMany_token_GuardOnEach token[GuardOnEach]
    |
    ;

Similar expansions handle the multiplicities '+' and '?'.