This page introduces my implementations of the two parser generation tools:
ylex is my implementation of the well-known lexical analyzer generator lex. If you're not familiar with lex, you can find a brief introduction here (lex/flex).
How it works
- ylex parses lexical rules defined with regular expressions in the .l file.
- Convert regular expressions to NFA (non-deterministic finite automata) using Thompson's method.
- Convert NFA into DFA (deterministic finite automata) using subset method.
- Finally generate a table-driven lexical analyzer in C++ language, the related codes are grouped into a class called Lex.
ylex implements most features of lex, however, there are some differences:
- ylex supports exclusive start state but not regular start state.
- ylex doesn't support the line-beginning, line-end symbol('^' and '$') and the look ahead operator '/'.
- ylex allows you to include literal code blocks in the lex file like the original lex, the three code blocks are placed in the beginning and the end of '_lex.cpp' file, and near the beginning of the function yylex().
- ylex uses less functions to work, some functions, like yyinput(), yyoutput(), yyless() etc. , and some macros are not supported in current version.
- yywrap() is not used in this version. ylex simply reads the lex file once into a large buffer for processing.
How to use
Currently the ylex can only run on Windows platform. ylex includes an executable file ylex.exe and two template files LEX_H_TEMPLATE and LEX_CPP_TEMPLATE which provide the templates for generating lexical analyazer.
To run ylex, type "ylex filename" in command window. Two files would be generated, namely _lex.h and _lex.cpp. An example is included in the bottom of this page.
yjacc, or Yet Just Another Compiler Compiler (sorry, all good names are already taken), is my implementation of the well-known parser generator yacc. If you're not familiar with yacc, you can find a brief introduction here (yacc/bison).
How it works
yjacc is used to generate a parser by specifying the structure of a language. It usually works together with ylex, because the input for a parser is a sequence of tokens instead of individual characters, and ylex feeds yjacc by the function yylex(). yjacc adopts the LALR(1) parsing technique, thus it generates a LALR(1) table-driven parser. Below is a brief summary of how it works:
- yjacc parses all the grammar rules defined by Context Free Grammar in the .y file.
- Build the LR(0) items collection with the help of FIRST() and NULLABLE() functions.
- Determine the spontanously generated lookaheads and the propagated lookheads.
- Propagate lookaheads until nothing changes, then we get the LALR(1) items collection, which is actully the LR(0) items collection attached with lookaheads.
- Finally generate a table-driven parser in C++ language, the related codes are group into a class called Parser.
Again, here are the differences that you should know:
- The major difference between yjacc and yacc is the declaration of %union. yjacc supports attribute grammar and syntax-directed translation, every terminal or non-terminal symbol has its own attribute. The %union declaration declares all possible value types that a symbol can have. However, some types in C language are really difficult to recognize, such as struct SomeType (*abc);. So, in yjacc, each declaration in %union only specifies the value name but no value type. To make it clear, see the following example.
In yacc, define the YYSTYPE as
In yjacc, define the YYSTYPE as
struct SomeType (*xval);
and user need to manually create a file YYSTYPE.h to make a complete declaration:
struct SomeType (*xval);
- The literal code blocks are placed into head of the _parser.cpp, the beginning of the function yyparse(), and _yaccmain.cpp(if the c code section is provided).
How to use
Currently yjacc can only run on Windows platform. yjacc includes an executable file yjacc.exe, two template files PARSER_H_TEMPLATE and PARSER_CPP_TEMPLATE which provide the templates for generating parser, and Limits.h defining important macros for the parser.
To run yjacc, type "yjacc filename" in command window. Three or four files would be generated, namely _parser.h, _parser.cpp, _tokens.h(declare all the terminal symbols) and _yaccmain.cpp(if it's necessary). An example is included in next section.
The source code for ylex and yjacc is available on my github page. Another project lygen is used to generate the ylex and yjacc programs which can parse .l and .y files, you will know what I mean when you see the code. The code was written 3 years ago, actually I don't recommend you to use it, but you may find it helpful if you are also going to implement one.
This example is taken from the book Lex & yacc by John R. Levine et al. It defines lexical and grammar rules for a simple calculator that can declare variables and do arithmetic operation between variables and numbers.
lex file,grammar file
After running ylex ch3-04.l and yjacc ch3-04.y, several files are generated:
_lex.h, _lex.cpp, _tokens.h, _parser.h, _parser.cpp, _yaccmain.cpp
Compile all the above files together with YYSTYPE.h and Limits.h in a C++ compiler. Here is a snapshot of the execution result.
Last updated 8/17/2014