The ParseLR Compiler-Compiler and Finite State Machine Suite

Introduction

The ParseLR toolkit is a parser generator and framework for use with the C# language on the .NET platform. It is designed to create LR(1) parsers, GLR parsers, or directly implemented finite state machine (FSM) engines. It can be used to convert a formal grammar description into C# source code that can parse input data or input events conforming to that grammar. Alternatively it can be used as an inline parser with dynamically changing input grammars, to generate and regenerate parsers for a changing grammar at run-time. Another feature that differentiates this parser from the many other parser generators available is that it allows guard conditions to be associated with each input token. This enables the tool to be used for finite state machine construction. By using guard conditions in an LR(1) or GLR parser, it is possible to describe conceptually complicated finite state machines in a succinct readable grammar, and have the state machine be constructed automatically by the parser generator.

Applications for this toolkit are broad, and include: language compilers; structured document scanners and interpreters; syntax highlighting and verification; workflow engines built from scenario descriptions; or control system state machines described by a formal grammar.

This page and the links from it provide documentation for the ParseLR compiler-compiler library and toolkit. Alternatively, for details on how to use the toolkit to develop finite state machines, go to these pages.

The toolkit can also be used to create generalised LR parsers (GLR parsers). A GLR parser is one that doesn't fail to produce a parser when a shift-reduce or reduce-reduce conflict is encountered. Instead the parser splits into two, and continues as if both parses are viable candidates. At some later input token, one of the parsers determines that the input token is syntactically incorrect, and that causes the incorrect parse to be abandoned. When the entire parse is complete, it is hoped that one or more valid parses survive, each of which is a valid recognition of the input token stream.

To understand how to use ParseLR to create and run GLR parsers, read all the documentation for LR(1) parsing, then visit the GLR pages to see how the tool is extended to support this.

ParseLR is a parser generator designed specifically for the C# programming language, for use on the various .NET platform implementations. It provides a grammar parser that parses LR(1) grammars, but that also supports conditional guards on input tokens. The parser that is generated is a modified LR(1) parser that uses table reduction similar to that suggested by David Pager[1], giving parse tables sized somewhere between those of an LR(1) parser and an LALR(1) parser, without losing any of the ability to parse full canonical LR(1) grammars. The guard extensions allow for one or more boolean functions provided by the parser user to be called when input tokens are consumed, the grammar rule only being recognised if the token matches and the guard condition is true.

The syntax for writing grammars in ParseLR is very similar to the traditional C language family compiler-compilers, such as Yacc or Bison, but with extensions to support the guards on tokens.

The parser generator can be used to create either inline grammar parsers, or offline parsers. An inline parser is one that reads its grammar from data after the application has launched, has the grammar converted into an executable parser while running, and then proceeds to use that parser to read input tokens by adding it to the current application as a dynamic assembly. By contrast an offline parser has its source code generated by the parser generator program. This new automatically generated source code is added to a source code project and compiled into that application before the parser can be run. An advantage of the offline parser is that there is no startup cost in compiling the grammar description into executable code.

Documentation

The description of how to write a grammar for consumption by the parser generator  spans a number of pages. These are best viewed in order as follows: