Tools for Lexical Analysis

(字句解析ツール)

5th lecture, May 19, 2023

Language Theory and Compilers

https://www.sw.it.aoyama.ac.jp/2023/Compiler/lecture5.html

Martin J. Dürst

AGU

© 2005-23 Martin J. Dürst 青山学院大学

Today's Schedule

 

Last Week's Homework

  1. Construct the state transition diagram for the NFA corresponding to the following grammar
    S → xB | yA | yC, A → xC | z | yS, B → zA | zC | xB | y, C →yA | aD | z
  2. Convert the automaton defined by the following transition table to a right linear grammar


        a         b    
    →T     G M
    *G K H
    H L G
    K H -
    *L M T
    M L G

     
    Result:

    T → aG | bM | a
    G → aK | bH
    H → aL | bG | a | b
    K → aH
    L → aM | bT
    M → aL | bG | a | b

  3. Construct the state transition diagram for the regular expression pr|x*t
    Write down two versions:
    1. The result of the full procedure (with all ε transitions)
    2. A version that is as simple as possible
  4. Bring your notebook PC with you next lecture (May 19).
    Make sure you can use flex, bison, gcc, make, diff, and m4 (no need to submit)

 

Leftovers from Last Lecture

Practical Regular Expressions

 

Summary for Regular Languages

We learned about:

These all:

 

Compilation Stages

  1. Lexical analysis
  2. Parsing (syntax analysis)
  3. Semantic analysis
  4. Optimization (or 5)
  5. Code generation (or 4)

 

Compiler Structure

 

Implementing Lexical Analysis

 

Choices for Implementing
Lexical Analysis

 

Example of flex Input

    int num_lines = 0, num_chars = 0;
%%
\n  ++num_lines; ++num_chars;
 .   ++num_chars;
%% int main(void) { yylex(); printf( "%d lines, %d characters\n", num_lines, num_chars ); } int yywrap () { return 1; }

 

flex Exercise 1

Process the flex program on the previous slide using cygwin

  1. Download and save the file test.l
  2. Create lex.yy.c with
    $ flex test.l
  3. Create the executable a.exe with
    $ gcc lex.yy.c
  4. Execute the program with input from stdin
    $ ./a <file

 

Overview of flex

 

How to Use Cygwin

 

Cygwin and Harddisks

 

flex Usage Steps

  1. Create an input file for flex (a (f)lex file),
    with the extension .l (example: test.l)
  2. Use flex to convert test.l to a C program:
    $ flex test.l
    (the output file is named lex.yy.c)
  3. Compile lex.yy.c with a C compiler, e.g. gcc
    (maybe together with other files)
  4. Execute the compiled program

 

Two Methods to Use flex

  1. Independent file processing (use regular expressions to recognize/change parts of a file):

    Call the yylex() function
    once from the main function

  2. Combination with parser:

    Repeatedly call yylex() from the parser,
    and return a token with return

In today's exercises and homework, we use method 1.

In later lectures, we will use method 2 together with bison.

 

Example of flex Input

    int num_lines = 0, num_chars = 0;
%%
\n  ++num_lines; ++num_chars;
 .   ++num_chars;
%% int main(void) { yylex(); printf("%d lines, %d characters\n", num_lines, num_chars ); } int yywrap () { return 1; }

 

Skeleton of flex Input Format

declarations,... (C program language)
declarations,... (C program language)
%%
regexp statement (C program language)
regexp statement (C program language)
%%
functions,... (C program language)
functions,... (C program language)

 

Structure of flex Input Format

Mixture of flex-specific instructions and C program fragments

Three main parts, separated by two %%:

  1. Preparation/setup part:
  2. Flex rules:
  3. Rest of C program (functions,...)

Newlines and indent can be significant!

 

How to Study flex

 

How the Program
Created by flex Works

 

How flex Works

 

flex Exercise 2

The table below shows how to
escape various characters in XML
Create a program in flex
for this conversion and for the reverse conversion

Raw text XML escapes
' &apos;
" &quot;
& &amp;
< &lt;
> &gt;

 

flex Exercise 3: Detect Numbers

Create a program with flex to output the input without changes, except that numbers are enclosed with >*> and <*<

Example input: abc123def345gh

Example output: abc>*>123<*<def>*>345<*<gh

Hint: The string recognized by a regular expression is available with the variable yytext

 

flex Exercise 4 (Homework):
General Rules

Deadline: June 1, 2023 (Thursday), 22:00

Where to submit: Moodle

Important: This homework requires significantly more time than other homeworks.
Start early, so that you can ask questions on May 26 (Friday) and in Moodle (Q&A Forum)

Submission: flex input file (.l file), name (kanji and kana) and student number as a comment at the top
(make sure comment is in UTF-8, and processing works even after adding comment (use only /* */, not //))

Collaboration: The same rules as for
Projects in Information Technology II apply!

 

flex Exercise 4 (Homework):
Lexical Analysis for C

Using flex, Create a program for lexical analysis of C programs. Output one token per line.

Process the following tokens:

 

flex Exercise 4 (Homework):
Example Input and Output

Simple example input:

if (xyz*3 > 15) abc = 'c';

Example output:

keyword: if
parenthesis: (
identifier: xyz
operator: *
integer constant: 3
operator: <
integer constant: 15
parenthesis: )
identifier: abc
operator: =
character constant: 'c'
semicolon: ;

 

Frequent Problems with flex

 

Hints for Homework

 

Announcement: Minitest

There will be a minitest (30 minutes) next week

Please prepare well!

 

Glossary

automate
自動化する
parser
構文解析器
lexical analyzer
字句解析器
lexical analyzer generator
字句解析器生成系 (生成器)
parser generator
構文解析器生成系 (生成器)
extension
拡張子
skeleton
骨格
definition
定義
initialization
初期化
integer literal
整数定数
character literal
文字定数