The grammar section

The grammar section is the main section of the grammar description file. Its purpose is to describe all the rules that define valid sequences of input events and guards. It also defines executable actions to be invoked at strategic points where parts of the input sequence corresponding to recognisable patterns of events or tokens have been parsed. From this list of grammar rules, the parser generator program creates a finite state machine that is capable of recognising all combinations of input events or tokens that constitute valid token sequences, and capable of generating errors as soon as the input sequence does not conform. Given an input token sequence, the parser also calls actions functions at appropriate points in the sequence to execute behaviour on recognition of token patterns.

Here is an example of a complete simple grammar, which when converted by a parser generator creates a state machine that recognises simple English sentences. Notice how the grammar section begins with the keyword grammar, and that the keyword is followed by the name of the top-level symbol in the grammar, in this case 'para'. The grammar has completely recognised its input token sequence when it has parsed exactly one 'para'.

Following the grammar keyword and its top-level symbol argument, the whole body of the grammar description is contained between curly braces.

The grammar section encloses a list of grammar rules.


// Define namespace, parser class, using statements, and
// any assembly references in the options section

options
{
    ...
}

// The list of input token types that come out of the tokeniser
			
events
{
    UNKNOWN = 5,
    NOUN = 10,
    VERB = 15,
    ADJECTIVE = 20,
    ADVERB = 25,
    THE,
    A,
    PERIOD
}
            
// Some guard functions that can be called during the parse
            
guards
{
    PluralNoun,
    SingularVerb,
    PluralVerb,
    Past
}
            
// The grammar itself, for which the top-level element
// that has to be recognised to complete the parse
// successfully is the non-terminal token 'para'
            
grammar(para)
{
    para: 
        sentence
    |   para sentence
    ;

    sentence:
        presentSentence 
        { BumpPresentTense(); }
    |   pastSentence 
        { BumpPastTense(); }
    ;

    presentSentence:
        nounPhrase[!PluralNoun] verb[!Past & SingularVerb] nounPhrase PERIOD
        { BumpSingular(); }
    |   nounPhrase[PluralNoun] verb[!Past & PluralVerb] nounPhrase PERIOD
        { BumpPlural(); }
    ;

    pastSentence:
        nounPhrase[!PluralNoun] verb[Past & SingularVerb] nounPhrase PERIOD
        { BumpSingular(); }
    |   nounPhrase[PluralNoun] verb[Past & PluralVerb] nounPhrase PERIOD
        { BumpPlural(); }
    ;

    nounPhrase:
        THE adjectives NOUN
    |   A adjectives NOUN
    |   adjectives NOUN[PluralNoun]
    ;

    adjectives:
        adjectives ADJECTIVE
        { AppendAdjective($1); BumpAdjectives(); }
    |
    ;

    verb:
        adverbs VERB
    ;

    adverbs:
        ADVERB
    |
    ;
}