(字句解析ツール)
https://www.sw.it.aoyama.ac.jp/2023/Compiler/lecture5.html
© 2005-23 Martin J. Dürst 青山学院大学
flex
exampleflex
flex
exercisesConvert the automaton defined by the following transition table to a right linear grammar
a | b | |
→T | G | M |
*G | K | H |
H | L | G |
K | H | - |
*L | M | T |
M | L | G |
Result:
T → aG | bM | a
G → aK | bH
H → aL | bG | a | b
K → aH
L → aM | bT
M → aL | bG | a | b
pr|x*t
flex
, bison
,
gcc
, make
, diff
, and m4
(no need to submit)Practical Regular Expressions
We learned about:
These all:
getNextToken()
)flex
)flex
Inputint num_lines = 0, num_chars = 0; %% \n ++num_lines; ++num_chars; . ++num_chars;
%% int main(void) { yylex(); printf( "%d lines, %d characters\n", num_lines, num_chars ); } int yywrap () { return 1; }
flex
Exercise 1Process the flex
program on the previous slide using cygwin
test.l
lex.yy.c
withflex test.l
a.exe
withgcc lex.yy.c
stdin
./a <file
flex
lex
,lex
: Lexical analyzer generatorbison
ls
: list the files in a directorymkdir
: create a new directorycd
: change (working) directorypwd
: print (current) working directorygcc
: compile a C program./a
: execute a compiled program (a.exe
)C:\cygwin
C:\cygwin
user1
C:\cygwin\home\user1
/home/user1
pwd
cd /cygdrive/c
flex
Usage Stepsflex
(a (f)lex file),.l
(example: test.l
)flex
to convert test.l
to a C program:flex test.l
lex.yy.c
)lex.yy.c
with a C compiler, e.g. gcc
flex
Call the yylex()
function
once from the main
function
Repeatedly call yylex()
from the parser,
and return a token with return
In today's exercises and homework, we use method 1.
In later lectures, we will use method 2 together with bison
.
flex
Inputint num_lines = 0, num_chars = 0; %% \n ++num_lines; ++num_chars; . ++num_chars;
%% int main(void) { yylex(); printf("%d lines, %d characters\n", num_lines, num_chars ); } int yywrap () { return 1; }
flex
Input Formatdeclarations,... (C program language)
declarations,... (C program language)
%%
regexp statement (C program language)
regexp statement (C program language)
%%
functions,... (C program language)
functions,... (C program language)
flex
Input FormatMixture of flex
-specific instructions and C program
fragments
Three main parts, separated by two %%
:
#include
s, #define
sNewlines and indent can be significant!
flex
Caution: Manual, not novel
Caution: Old, but good enough for homework
flex
(lex.yy.c
)flex
for different inputsflex
source code (flex
itself usesflex
)flex -v
(verbose))flex
Works.l
fileflex
Works.l
file).l
fileflex
Exercise 2The table below shows how to
escape various characters in XML
Create a program in flex
for this conversion and for the reverse conversion
Raw text | XML escapes |
---|---|
' |
' |
" |
" |
& |
& |
< |
< |
> |
> |
flex
Exercise 3: Detect NumbersCreate a program with flex
to output the input without changes,
except that numbers are enclosed with >*>
and
<*<
Example input: abc123def345gh
Example output:
abc>*>123<*<def>*>345<*<gh
Hint: The string recognized by a regular expression is available with the
variable yytext
flex
Exercise 4 (Homework):Deadline: June 1, 2023 (Thursday), 22:00
Where to submit: Moodle
Important: This homework requires significantly more time than other
homeworks.
Start early, so that you can ask questions on May 26 (Friday) and in Moodle (Q&A
Forum)
Submission: flex
input file (.l
file),
name (kanji and kana) and student number as a
comment at the top
(make sure comment is in UTF-8, and processing works even after adding comment
(use only /* */, not //))
Collaboration: The same rules as for
Projects in Information Technology II apply!
flex
Exercise 4 (Homework):Using flex
, Create a program for lexical analysis of C
programs. Output one token per line.
Process the following tokens:
flex
Exercise 4 (Homework):Simple example input:
if (xyz*3 > 15) abc = 'c';
Example output:
keyword: if parenthesis: ( identifier: xyz operator: * integer constant: 3 operator: < integer constant: 15 parenthesis: ) identifier: abc operator: = character constant: 'c' semicolon: ;
flex
.l
file, flex
may
still run without errors
Solution: Always start with flex
:
> flex file.l && gcc lex.yy.c && ./a
<input.txt
.l
file (before the first
%%
), C program fragments have to be indented by at least one
space int yywrap () { return 1; }
yytext
putchar(yytext[0]);
will output the first character
of the matched text\
or quoted within ""
There will be a minitest (30 minutes) next week
Please prepare well!