Tools for Lexical Analysis


5th lecture, May 12, 2017

Language Theory and Compilers

Martin J. Dürst


© 2005-17 Martin J. Dürst 青山学院大学

Today's Schedule


Last Week's Homework 1


Last Week's Homework 2


Last Week's Homework 3


Last Week's Homework 4

Bring your notebook PC (with flex, bison, gcc, make, diff, and m4 installed and usable)


Leftovers from Last Week


Summary up to Now

These all have the same power, describe/recognize regular languages, and can be converted into each other.


Compilation Stages

  1. Lexical analysis
  2. Parsing (syntax analysis)
  3. Semantic analysis
  4. Optimization (or 5)
  5. Code generation (or 4)


Compiler Structure


Implementing Lexical Analysis



Overview of flex


How to Use Cygwin



Cygwin and Harddisks



flex Usage Steps

  1. Create an input file for flex (a (f)lex file), with the extension .l (example: test.l)
  2. Use flex to convert test.l to a C program:
    $ flex test.l
    (the output file is named lex.yy.c)
  3. Compile lex.yy.c with a C compiler (maybe together with other files)
  4. Execute the compiled program


Two Ways to Use flex

  1. Independent file processing (use regular expressions to recognize or change parts of a file):

    Call the yylex() function once from the main function

  2. Calling the lexical analyzer from the parser:

    Repeatedly call yylex() from the parser, and return a token with return

In today's exercises and homework, we will use 1.

In the second half of this cours, we will use 2. together with bison


Example of flex Input Format

        int num_lines = 0, num_chars = 0;
\n      ++num_lines; ++num_chars;
 .       ++num_chars;
%% int main(void) { yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars ); } int yywrap () { return 1; }


flex Exercise 1

Process and execute the flex program on the previous slide

  1. Create a file test.l and copy the contents of the previous slide to the file
  2. Create the file lex.yy.c with
    $ flex test.l
  3. Create the executable file a.exe with
    $ gcc lex.yy.c
  4. Execute the program with some input from standard input
    $ ./a <file


Skeleton of flex Input Format

declarations,... (C program language)
declarations,... (C program language)
regexp statement (C program language)
regexp statement (C program language)
functions,... (C program language)
functions,... (C program language)


Structure of flex Input Format

Mixture of flex-specific instructions and C program fragments

Three main parts, separated by two %%:

  1. Preparation/setup part:
    C #includes, definition and initialization of global variables, definition of regular expression components
  2. Regular expressions to be recognized (lexical rules), and the program fragments executed on recognition
  3. Rest of C program (functions,...)

Newlines and indent can be significant!


How to Study flex


Basic Behavior of flex


How flex Processes its Input


flex Exercise 2

The table below shows how to escape various characters in XML
Create a program in flex (for this conversion, and) for the reverse conversion

Raw text XML escapes
' &apos;
" &quot;
& &amp;
< &lt;
> &gt;


flex Exercise 3: Detect Numbers

Create a program with flex to output the input without changes, except that numbers are enclosed with >>> and <<<

Example input:


Example output:


Hint: The string recognized by a regular expression is available with the variable yytext


flex Exercise 4 (Homework):
General Rules

Deadline: May 25, 2017 (Thursday), 19:00

Where to submit: Box in front of room O-529 (building O, 5th floor)

(start early, so that you can ask questions on May 19)


Collaboration: The same rules as for Computer Practice I (計算機実習 I) apply


flex Exercise 4 (Homework):
Lexical Analysis for Ruby

Using flex, create a program for lexical analysis of Ruby programs. Output the tokens below on a single line.

kind explanation example output (example)
instance variable starts with @ @abc Instance variable: @abc
class variable starts with @@ @@def Class variable: @@def
method name ends with single !, ?, or = nil? Method: nil?
method or local variable starts with lower-case letter my_var Method or variable: my_var
global variable starts with $ $global Global variable: $global
constant starts with upper-case letter String Constant: String
integer (decimal) starts with 0d, 0D, or any digit except 0 1_234 Decimal Integer: 1234
integer (octal) starts with 0 0765 Octal Integer: 501
integer (hexadecimal) starts with 0x or 0X 0xAB_D3 Hex Integer: 43987
integer (binary) starts with 0b or 0B 0b1010_1100 Binary Integer: 172
comment from # to end of line # hello Comment: hello
[error] all other single characters except whitespace \ Error: '\'

Advanced problem: Deal with other Ruby lexical tokens (e.g. operators, strings, regular expressions, floating point numbers,...).


Frequent Problems with flex


Hints for Homework



There will be a minitest (ca. 30 minutes) next week. Please prepare well.



lexical analyzer
lexical analyzer generator
字句解析器生成系 (生成器)
parser generator
構文解析器生成系 (生成器)
integer literal
character literal