Writing an inline parser class

Here we describe how you structure your source code and make references to libraries so that you can create a parser that is generated in-line, in the same program that runs that parser. The grammar description is provided as either a character string in source code, or as an input stream, meaning it could be read from a database or from a file.

Project references

Any in-line parser project will need to include two DLLs among the references. These are Parsing.DLL that contains the classes in the namespace Parsing, and ParserGenerator.DLL that contains the classes in the namespace ParserGenerator. Naturally the addition of using statements to the top of source files that refer to members of these DLLs will simplify your source code.

Creating the parser class

First your source code will need to have an application specific parser class that you have written. The actual parser class already exists as Parsing.Parser in Parsing.DLL, so all you need to ensure is that your application specific parser class inherits from Parsing.Parser.

An example of the structure of your source file containing your application specific parser class might be:


using Parsing;

// ... other using statements ...

namespace MyApplication
{
    public class MyParser : Parser
    {
        // ... Application specific class members here ...
    }
}

Your parser will represent a single object that acts as the interface to any data structures that are being built or manipulated while the parser is executing. You would typically include any data members and methods to access them within the application specific parser class.

Action functions

With both inline and offline parsers, you may have elected for the action code executed on rule reduction to be added to the source code for the parser. These action functions will have had their names listed in the actions section of the grammar description, and will all have the same standard function signature. For example, if we have an action called BumpPlural it should have its implementation written inside your application-specific parser class as follows:


using Parsing;

// ... other using statements ...

namespace MyApplication
{
    public class MyParser : Parser
    {
        // Application specific class members
        
        public int PluralCount { get; set; }
        
        public void BumpPlural(object[] args)
        {
            if(someErrorOccurred)
                args[0] = new ParserError
                {
                    Message = "Nature of error"
                }
            else
                PluralCount++;
        }
    }
}

Note the signature of the action method. It has a return type of void, as the parsing engine makes no attempt to pick up a return value. The object array passed as the single argument to the action function will have been filled in with a selection of the values from each of the tokens on the right hand side of the production being reduced when this action was called. The array always has a size of at least one element. This zeroth element of the array is used to return a value back to the parser that will be used as the value for the single LHS token that replaces all the RHS tokens that are being removed from the top of the parser stack.

If you construct an instance of a Parsing.ParserError object and place that into args[0] before returning from any action function, this indicates to the parser that your action function identified an error in the parsing process. The Message property of the ParserError object should be filled in with an explanation of the fault, as this field is used to report back to the parser error output channels the nature of the fault.

On receipt of a ParserError object in slot zero of the argument array, the parser will either abort the current parse, or if error recovery has been enabled, will attempt to pop the parser state stack and shift input tokens until a suitable resumption point has been reached. A description of how this error recover mechanism functions is given elsewhere.

Guard functions

If your grammar requires guard functions to be evaluated for some of the tokens as they are parsed, you might have decided to write these guard functions in your application specific parser class. If so, their names will also have been listed in the guards section of the grammar.

Guard functions take a single object argument that contains the value of the most recently received input token. This is usually the terminal token the guard expression is positioned next to in the grammar. However, it is possible for a guard expression to be placed alongside a non-terminal token, in which case the object argument is the value of the terminal token the guard is being tested against. For more details of how guards on non-terminals are mapped back to their respective terminal tokens, see the detailed article describing this mapping.

Guard functions should return a boolean result. This should be set to true if the token should be accepted at this point in the parse, and false if it is not appropriate at this point. An example of a guard function named PluralNoun is given in the parser class below:


using Parsing;

namespace MyApplication
{
    public class MyParser : Parser
    {
        // Application specific class members
        
        public int PluralCount { get; set; }
        
        public void BumpPlural(object[] args)
        {
            if(someErrorOccurred)
                args[0] = new ParserError
                {
                    Message = "Nature of error"
                }
            else
                PluralCount++;
        }
        
        public bool PluralNoun(object arg)
        {
            return arg.ToString().EndsWith("s");
        }
    }
}

As a guide to writing boolean guard functions, try to arrange that each function evaluates a single truth value about the state of the parse or its data. Avoid combining several evaluable boolean items with boolean operators within a single function. The reason for this is that the grammar description allows compound boolean expressions to be constructed using the boolean guard functions and the operators 'and', 'or' and 'not'. By keeping the guard functions as primitive as possible, you can reuse them in different combinations for different guard expressions at different points in the grammar.

Constructors

Your in-line parser class should either have no constructor, or should be written with a default (parameterless) constructor. If you need to initialise members with values, this should be done via a separate initialisation method, or via property assignments after the parser instance has been created. The reason for this is that the ParserFactory<T>.InitializeFromGrammar and the ParserFactory<T>.CreateInstance methods used to create an instance of the resulting parser expect the application-specific parser class to have a default constructor. An example of some suitable creation and initialisation code is given below. Note that the ParserFactory<T>.InitializeFromGrammar and ParserFactory<T>.CreateInstance methods are described elsewhere.


    ParserFactory<MyParser>.InitializeFromGrammar( ... args ... );
    MyParser p = ParserFactory<MyParser>.CreateInstance();
    p.SomeMemberProperty = someValue;