Free Compiler Construction Tools

Lexical analyser (lexer) and parser generators, programming language creation kits


Free Compiler Construction Tools

If you are thinking of creating your own programming language, writing a compiler or interpreter, or a scripting facility for your application, or even creating a documentation parsing facility, the tools on this page are designed to (hopefully) ease your task. These compiler construction kits, parser generators, lexical analyzer / analyser (lexers) generators, code optimzers (optimizer generators), provide the facility where you define your language and allow the compiler creation tools to generate the source code for your software.

If you want a (printed) book on compiler construction, you might want to check out the famous Compilers: Principles, Techniques, and Tools by Aho, Sethi and Ullman. The "Dragon book", as it is affectionately called by some, is regarded by many as the standard book on writing compilers.

If you want free programming language grammars for a particular language (eg, C, C++, Ada, COBOL, etc) to ease your task for constructing a compiler for that language, check out the Free Programming Language Grammars for Compiler Design page.

Related Pages

Free Compiler Construction Kits

Jay New

Jay generates an LALR(1) parser for C# and Java when presented with an appropriate grammar. It is essentially the Berkeley yacc (see the byacc entry elsewhere on this page) retargeted for these programming languages. Binaries are available for use with macOS X as well as any Java virtual machine. The source files are also available.

JFlex

JFlex is a lexical analyzer for Java with Unicode support. It takes the regular expressions and actions you write and produces a deterministic finite automaton (DFA). The generated lexers will need JDK 7 and above and JFlex itself requires JDK 1.8 and above. It is open source.

JLex

JLex is a lexical analyzer for Java. Its input file is similar to that accepted by lex. It accepts a Unicode specification file, and you can configure it to generate a scanner that handles Unicode characters. It is open source.

Bison (parser generator)

Bison generates a parser when presented with a LALR (1) context-free grammar that is yacc compatible. The generated parser is in C. It includes extensions to yacc that make it easier to use if you want multiple parsers in your program. Bison works on Windows, MSDOS, Linux and numerous other operating systems. The link points to the source code. Although the program itself is under GPL, the generated parser (using the bison.simple skeleton) can be distributed without restriction. You can find a Windows port (one of many around) at WinFlexBison.

Flex (Lex drop-in replacement)

Flex generates a lexical analyser in C or C++ given an input program. It is designed so that it can be used together with yacc and its clones (like byacc and bison, also listed on this page). It is highly compatible with the Unix lex program. The original version of Flex, on which the above is based, can be found at ftp://ftp.ee.lbl.gov. If you are looking for a Windows port, one possibility is WinFlexBison.

RE/flex

RE/flex is a lexical analyzer generator for C++ that is compatible with flex (also listed on this page). It has integrated support for Unicode character sets (UTF-8, UTF-16, UTF-32), generates thread-safe scanners by default, can optionally use Boost Regex as a regex engine, supports lazy quantifiers, word boundary anchors (etc) in regular expressions, and so on. When not invoked with flex compatibility, it will not use macros and globals. The generated scanner produces C++ scanner classes derived from a template. The source code is released under the BSD 3-Clause licence, and can be compiled under Windows, Mac OS X and Linux, although a Windows executable is also provided in the distribution package.

Waxeye Parser Generator

Waxeye is a parser generator that takes a Parsing Expression Grammar (PEG) as input. It supports C, Java, JavaScript, Python, Ruby and Scheme. It also supports modular grammars (where your grammar is split across multiple files) and grammar testing (to check that every component of your language is parsed the way you want it to). The program is released under the MIT licence.

peg/leg

The peg program creates a C recursive descent parser generator from a Parsing Expression Grammar (PEG). The alternative, leg, uses a slightly different syntax to make it more convenient for those who are familiar with lex and yacc. Both support unlimited backtracking, ordered choice as a means for disambiguation, and can combine lexical analysis and parsing into a single activity. The program is released under the MIT licence, and the parsers created are unencumbered by any licence.

re2c

This program, re2c, generates C/C++ lexical analyzers. It takes regular expressions that you write and produces a deterministic finite automaton (DFA), which, when run, will process input according to your rules and execute the matching code (which you write). Unlike lex and flex (see elsewhere on this page), you do not call the yylex() function to start the lexer, but have access to a variety of lower-level functions which give you greater flexibility in processing your input. According to their website, this lexer generator is used in programs like PHP and SpamAssassin. The source code is in the public domain.

AdaGOOP

AdaGOOP, which stands for Ada Generator of Object Oriented Parsers, creates a parser that generate an object oriented parse tree, and a traversal of the tree using the visitor pattern. It relies on the SCATC versions of aflex and ayacc which you can also get from their site. The source code is provided, and there are no restrictions on its use.

Quex - A Mode Oriented Directly Coded Lexical Analyzer Generator

Quex, or Queχ (depending on which part of the site you read), produces a directly coded lexical analyzer engine with pre- and post- conditions rather than the table-driven created by the Lex/Flex family (see elsewhere on this page). Features include inheritable "lexer modes" that provide transition control and indentation events, a general purpose token class, a token queue that allow tokens to be communicated without returning from the lexical analyzer function, line and column numbering, generation of transition graphs, etc. You will need to install Python before using this lexical analyser generator. It generates C++ code. It is released under the GNU LGPL with additional restrictions; see the documentation for details. Windows, Mac OS X, Solaris and Linux are supported.

Gardens Point LEX

[Note: I'm not sure if the above link points to the official version of Gardens Point LEX, since there seems to be at least a couple of repositories around.] The Gardens Point Scanner Generator, GPLEX, accepts a lex-like input specification to create a C# lexical scanner. The scanner produced is thread-safe and all scanner state is maintained within the scanner instance. Note that the input program does not support the complete POSIX lex specifications. The scanner uses the generic types defined in C# 2.0.

Gardens Point Parser Generator

The Gardens Point Parser Generator, GPPG, accepts a yacc-like program to produce a thread-safe bottom-up C# parser. The parser uses the generic types defined in C# 2.0.

Bisonc++

Supplied with an LALR(1) context-free grammar, bisonc++ generates a C++ parser class. As its name suggests, the parser generator was originally derived from the Bison parser generator (see elsewhere on this page), and grammars used for the latter software can supposedly be adapted to bisonc++ with little or no change.

Grammatica

Grammatica is a parser generator for C# and Java. It uses LL(k) grammars with unlimited number of look-ahead tokens. It purportedly creates commented and readable source code, has automatic error recovery and detailed error messages. The generator creates the parser at runtime thus also allowing you to test and debug the parser before you even write your source code. The program is released under the GNU General Public License with an exception to facilitate its use by commercial software.

Accent Compiler Compiler

A compiler-compiler that avoids the problems of the LALR parsers (eg, when faced with shift/reduce and reduce/reduce conflicts) and LL parsers (with its restrictions due to left-recursive rules). You specify your input grammar in the Extended-Backus-Naur-Form, in which you are allowed to indicate repetition, choices and optional parts. You can insert semantic actions anywhere, and ambiguous grammars are allowed. All these features make Accent grammars easier to write than (eg) Yacc grammars. The website warns however that the generated code require significantly more system resources than code generated by Yacc. Accent is distributed under GNU GPL. I'm not sure about the generated C code.

PRECCX (Prettier Compiler-Compiler Extended)

PRECCX, or PREttier Compiler-Compiler eXtended, is "an infinite-lookahead compiler-compiler for context dependent grammars" which generates C code. You specify an input grammar in an extended BNF notation where inherited and synthetic attributes are allowed. The parser is essentially LL(infinity) with optimisations. You can get versions for MSDOS, Linux and other Unices (including Sun, HP, etc). Source code is available and you can apparently compile it on other platforms with an ANSI C compiler if needed.

Byacc/J (Parser Generator)

This is a version of Berkeley yacc modified so that it can generate Java source code. You simply supply a "-J" option on the command line and it'll produce the Java code instead of the usual C output. You can either get the free source code and compile it yourself, or download any of the precompiled binaries for Solaris, SGI/IRIX, Windows, and Linux. Like the byacc original (see elsewhere on this page), your output is free of any restrictions, and you can freely use it for any purpose you wish.

COCO/R (Lexer and Parser Generators)

This tool generates recursive descent LL(1) parsers and their associated lexical scanners from attributed grammars. It comes with source code, and there are versions to generate Oberon, C#, F#, VB.Net, C, C++, Java, Swift, Pascal, Delphi, Ada, Python, Oberon, etc. Platforms supported appear to vary (Unix systems, Apple Macintosh, Atari, MSDOS, etc) depending on the language you want generated.

Eli

A programming environment that allows you to generate complete language implementations from application-oriented specifications. The user describes the problems that needs to be solved and Eli uses the tools and components required for that problem. It handles structural analysis, analysis of names, types, values, stores translation structures and produces the target text. It generates C code. The program is available in source form and has been tested under Linux, IRIX, HP-UX, OSF, and SunOS. Eli itself is distributed under the GNU GPL but the generated code is your property to do as you please.

ALE

This freeware system, written in Prolog, and requiring SICStus Prolog 3.7, SWI Prolog or Quintus Prolog (no longer maintained?) to run, handles phrase structure parsing, semantic-head-driven generation and constraint logic programming and includes a source level debugger.

Gentle Compiler Construction System

This compiler construction tool purports to provide a uniform framework for language recognition, definition of abstract syntax trees, construction of tree walkers based on pattern recognition, smart traversal, simple unparsing for source to source translation and optimal code selection for microprocessors. Note however that if you use it to create an application, the licensing terms require that your applications be licensed under the GNU GPL. This probably restricts your use of it in a commercial program, unless you are prepared to pay for a special license or you plan to make the sources for your program available anyway.

Bison for Eiffel (Parser generator)

This version of Bison produces Eiffel source code. Like Bison, it is released under the GNU GPL. I am uncertain whether the generated parser can be distributed freely (the current versions of Bison allow this if you do not modify the output) without restrictions.

ANTLR (Recursive Descent Parser Generator)

ANTLR generates a recursive descent parser in C, C++ or Java from predicated-LL(k>1) grammars. It is able to build ASTs automatically. If you are using C, you may have to get the PCCTS 1.XX series (the precursor to ANTLR), also available at the site. The latest version may be used for C++ and Java.

Byacc: original version and current version

[Note: the link for the original version uses the FTP protocol. If your browser does not support this, you will probably need to use an FTP client to go to that address.]
Berkeley YACC ("Yet Another Compiler Compiler") is a public domain parser generator that is the precursor of the GNU BISON. The "original version" link points to the original version by Robert Corbett, released to the public domain (that is, not copyrighted). The "current version" link provides the version that is maintained by Thomas E. Dickey. Both versions are in source code form, which should compile with many compilers (including GNU's gcc). It generates C code.

BtYacc (generates parsers)

To quote from the documentation, BtYacc, or BackTracking Yacc, "is a modified version of Berkeley Yacc that supports automatic backtracking and semantic disambiguation to parse ambiguous grammars, as well as syntactic sugar for inherited attributes". The program comes with sources which are in the public domain. Although the author only mentions compilation of the program on Unix and Win32 systems, it is likely that the program can be compiled and run on DOS systems using an MSDOS port of the GNU C compiler like DJGPP, since the GNU compiler was used on the other systems. For more information about DJGPP, see the Free C and C++ compilers page.

Java Compiler Compiler (JavaCC)

This Java parser generator is written in Java and produces pure Java code. It even comes with grammars for Java 1.0.2, 1.1 as well as HTML. It generates recursive descent parsers (top-down) and allows you to specify both lexical and grammar specifications in your input grammar. In terms of syntactic and semantic lookahead, it generates an LL(1) parser with specific portions LL(k) to resolve things like shift-shift conflicts. The input grammar is in extended BNF notation. It comes with JJTree, a tree building preprocessor; a documentation generator; support for Unicode (and hence internationalization), and many examples. There are numerous other features, including debugging capabilities, error reporting, etc.

Programming Language Creator

According to the documentation, the Programming Language Creator is designed to enable you "to easily create new programming languages, or create interpreted versions of any compiled language" without the need for you to wrestle with yacc and lex. If you want your application to have a scripting language, you might want to look at this to see if it meets your requirements. The binaries, available free, are for Windows, and the source code is available for a fee.

SableCC

This is an object-oriented framework that generates DFA based lexers, LALR(1) parsers, strictly typed syntax trees, and tree walker classes from an extended BNF grammar (in other words, it's a compiler generator). The program was written in Java itself, runs on any Java 1.1 (or later) system and generates Java sources.

LEMON Parser Generator

This LALR(1) parser generator claims to generate faster parsers than Yacc or Bison. The generated parsers are also re-entrant and thread-safe. The program is written in C, and only the source code is provided, so you will need a C compiler to compile it before you can use it.

YaYacc (Generates Parsers)

[Update: this software is no longer available.] YaYacc, or Yet Another Yacc, generates C++ parsers using an LALR(1) algorithm. YaYacc itself runs on FreeBSD, but the resulting parser is not tied to any particular platform (it depends on your code, of course).

Jaccie (Java-based Compiler Compiler) and SIC (Smalltalk-based Interactive Compiler Compiler)

[Update: this software is no longer available. For the record, it used to be found at http://www2.cs.unibw.de/Tools/Syntax/english/index.html] Jaccie includes a scanner generator and a variety of parser generators that can generate LL(1), SLR(1), LALR(1) grammars. It has a debugging mode where you can operate it non-deterministically. It is based on the earlier SIC, which uses the Smalltalk programming language for evaluation rules.

TP Lex/Yacc (Lexical Analyzer and Parser Generators)

[Update: this software is no longer available.] This is a version of Lex and Yacc designed for Borland Delphi, Borland Turbo Pascal and the Free Pascal Compiler (you can find legally free versions of all the above listed on the Free Delphi Compilers and Pascal Compilers page). Like its lex and yacc predecessors, this version generates lexers and parsers, although in its case, the generated code is in the Pascal language.

Related Pages

Newest Pages

How to Link to This Page

It will appear on your page as:

Free Compiler Construction Tools: Lexer and Parser Generators

 



 

thesitewizard.com: Free Webmaster Tutorials, Scripts and Articles

HowtoHaven.com: Free How-To Guides

thefreecountry.com: Free Programmers, Webmasters and Security Resources
If you find this site useful, please link to us.