Adding guards to tokens

In traditional parsers, terminal and non-terminal tokens were the only type of element used for describing grammar rules. However, parser generators create complex finite state machines that implement/enforce the grammar rules when parsing a stream of terminal input tokens. We can also use the grammar description file therefore to describe a state machine directly rather than use it to describe grammars for languages.

When doing this, we tend to use stylised rules that obey a particular format. The non-terminal token to the left side of a rule represents a state our state machine will jump to. The first non-terminal on the right side of the rule is the state we might transit from. The second token on the right side of the rule is a terminal token representing the input event that will cause the transition between these state.

In conventional state machines, transitions between states are caused by events that are valid input events for the starting state. However, a common extension is for the transition to only be fired if the event occurs, and an associated guard condition is true at the time the event fires. The parser generator used in the off-line parser (parselr.exe) and the inline parser support the specification of a boolean guard expression attached to any token, be it terminal or non-terminal.

Guard expression syntax

A guard expression is always enclosed in square brackets, and consists of guard function names and the boolean operators for not (!), and (&) and or (|). Parentheses can be included in the boolean expressions to make ultra clear the expected precedence of the boolean operations. Hence the following grammar rule, representing a state transition accompanied by a guard expression, has a sensible meaning:


Attached : Unattached MEETSPARTNER[(!Married) & ((Tall & Dark & Handsome) | Rich)];

As standard precedence is enforced, with the not (!) operator being higher precedence than and (&), and both being higher precedence than or (|), the following guard has the same meaning, albeit slightly less readable:


Attached : Unattached MEETSPARTNER[!Married & (Tall & Dark & Handsome | Rich)];

Warning: a close inspection of what this guard condition implies may slightly offend your scruples! Note also that using an LR(1) grammar in this way to write a simple finite state machine is not particularly efficient. The ParseLR toolkit has an alternative, much more efficient technique for creating simple finite state machines that is described elsewhere.

Guards on non-terminals

It is also possible to place guard expressions on non-terminal symbols in grammar rules. The meaning of this is that the guard condition is applied to the last terminal symbol in any descendant rules from the non-terminal with the guard after it. Hence in the following set of rules where lower-case tokens represent non-terminals, and upper-case tokens represent terminals:


w : x y[GUARD];
y : T
|   V
|   // Empty rule
;
x : R S;

the guard maps back onto the last terminal tokens of the rules for x and y as rewritten below. Note that the empty rule for y also allows the guard to map back further onto preceding tokens for y:


w :  x y1
|    x1 y0
;
y1 : T[GUARD]
|    V[GUARD]
;
y0 :   // Empty rule
;
x1 : R S[GUARD]
;

The meaning for guards on non-terminals is as follows. The reduction to a non-terminal that has a guard on it will only take place if all descendent rules from that non-terminal have the guard condition evaluating to true immediately before the reduction takes place. If that guard condition is not true, the reduction will not be applied. This means that either a parsing error will occur because the input token and guard stream does not match the grammar, or if another rule with different guard conditions does match the input stream, that would be reduced successfully instead.

In particular, note that applying a guard to the first token in a grammar rule, where that token is non-terminal and has the empty rule among its own rules, will cause the guard to be propagated back to any parent rule in which the grammar rule's left-hand token appears. For example, the following grammar:


a : T b;
b : c[GUARD];
c : U V
|   // Empty rule
;

would be rewritten internally by the parser so that the parent rule for non-terminal 'a' would be affected in the following way:


a : T b1
|   T[GUARD] b0;
b1 : c1;
b0 : c0;
c1 : U V[GUARD];
c0 : // Empty rule
;

An interesting (though not necessarily useful) consequence of the propagation back of guard conditions is that under some circumstances it is possible for the first thing that happens in an entire parse of a grammar to be a test of a guard condition. This can occur if the top-level grammar non-terminal has as a descendant of its first token an empty rule, and the same descendant somewhere in the grammar has a guard condition applied to it. The parsers created by this parser generator are able to cope with this should it occur.

Parsing rules with guards

There may be places in the grammar where the parser is faced with two alternative parsing paths that differ only in the guard condition applied to the next token. More specifically the two paths branch on recognition of the same next token, but that token appears in the current state with two different guard conditions, or one with a guard and the other without.

When this is encountered, the parser generator evaluates the guard conditions on both tokens, and applies the following rules to determine which to test for first when deciding which rule to apply: