Use of Tools for Parsing
(構文解析用のツールの詳細)
10th lecture, June 21, 2019
Language Theory and Compilers
http://www.sw.it.aoyama.ac.jp/2019/Compiler/lecture10.html
Martin J. Dürst
© 200519 Martin
J. Dürst 青山学院大学
Today's Schedule
 About last lecture's homework
 Summary of last lecture
 How to write a grammar:
 Priorities
 Associativity
 Repetition (lists)
 Other constructs
 How
bison
works:
 Different orders of derivation
 LALR parsing
 How to read the
.output
file
 How to debug
bison
grammars
Last Lecture's Homework
都合により削除
Summary of Last Lecture
 Creating a program with
bison
and flex
needs
many steps, so using make
is important
 The input format for
bison
is very similar to the input
format for flex
, but there are also some differences
bison
uses attribute grammars to calculate the result of
parsing
 The attributes are referenced as
$$
(left hand side) and
$1
, $2
,... (right hand side) in the C program
fragments
 Priority and associativity of operators are expressed in the rewriting
rules of the grammar
Leftovers from Last Lecture
How to Express Priorities
 Use a separate nonterminal symbol for each priority level
 Write the grammar starting with the lowest priority (outside)
 On the right hand side of the rewriting rule for a given priority's
nonterminal,
use nonterminals of the same or one level higher priority
 How to select names for nonterminals:
 Use mathematical terms (用語): term (項), factor (因子)
 Use the type of operator, or one representative operator
(shift_expression, mulExpression,...)
Grammar Patterns: Priority
(priority is small_exp > middle_exp > big_exp; assuming left
associative)
big_exp: big_exp operator middle_exp
 middle_exp
;
middle_exp: middle_exp operator small_exp
 small_exp
;
How to Express Associativity
 Left associative:
 Use the same nonterminal on the left hand side of the rewriting rule
and on the right hand side to the left of the operator
 Use the one level higher priority nonterminal to the right
of the operator
 (unless this operator is required) Also create a rewriting rule with
just the one level higher nonterminal as the right hand side
 Right associative:
 Exchange the nonterminals to the right and the left of the
operator
Grammar Patterns: Associativity
Left associative:
big_exp: big_exp left_assoc_op small_exp
 small_exp
;
Right associative:
big_exp: small_exp right_assoc_op big_exp
 small_exp
;
How to Express Repetition (Lists)
 Example: A list of statements (
statementList
or
statements
)
 Two rewriting rules are needed
 Base rule:
 Repetition of 0 or more times: A rewriting rule with an empty right
hand side
 Repetition of 1 or more times: A rewriting rule with a single element
(e.g.
statement
) of the list on the right hand side
 Inductive rule:
 Right hand side uses both list and single element nonterminals
 If associativity is important, it determines the order of the two
nonterminals
 If associativity is not important, there are two choices:
 List first: Left recursion; advantage: smaller stack
 Element first: Right recursion
Grammar Patterns: Repetition
Zero or more times:
items: items item

;
One or more times:
items: items item
 item
;
Instead of "items item
", "item items
" is also
possible, but bison
's stack may become a problem
Grammar Patterns: Parentheses
small_exp: open_paren big_exp close_paren
 literal
;
How to Express Other Constructs (e.g. if statement)
Write as is, carefully distinguishing alternatives and terminal/nonterminal
symbols
if_statement : IF OPENPAREN cond CLOSEPAREN statement
 IF OPENPAREN cond CLOSEPAREN statement
ELSE statement
;
Order of Derivation: Leftmost and Rightmost Derivation
With leftmost derivations, the leftmost nonterminal in the
syntax tree is always expanded first
With rightmost derivations, the rightmost nonterminal in
the syntax tree is always expanded first
Simple example grammar:
E → E '' T
 T
T → integer
Example of input: 5  7  3
Derivation Choices
Different choices may generate:
 Different words:
This is necessary to be able to process different inputs
 Different syntax trees (same word):
Ambiguous grammar, try to avoid
 Different orders (leftmost/rightmost/... derivation; same syntax
tree):
Different parsing algorithm
Kinds of Analysis Methods
 LL: Read input from the left, use leftmost derivation (used in
topdown parsing)
 LR: Read input from the left, use rightmost derivation, in reverse
order)
 LL(1): LL, with one token lookahead
 LR(1): LR, with one token lookahead
 LALR: A kind of LR(1), used widely in
yacc
and
bison
The labels are also used for grammars:
grammar g is LL(1) ⇔ grammar g can be used with an
LL(1) parser.
Understanding bison
: The .output
File
bison v
creates a file with extension .output
,
containing the following interesting details:
 [Problems: Unused terminal symbols, conflicts]
 Grammar: Numbered rewriting rules; rule number 0 is
$accept:
start_symbol $end
)
 Terminals: Numbered terminal symbols; numbers are ASCII codes or
≧256)
 Nonterminals, with numbers of rules where they appear
 States, with the following information for each state:
 Rewriting rules (
.
shows current position)
 Terminal symbols (or $default) and the action (shift, reduce, goto)
if this symbol is the next symbol in the input
 Nonterminal symbols: goal state of transition after reduction
Understanding bison
: Debuging
#define YYDEBUG 1
switches on debugging
The output shows how bison
works:
 A (pushdown) stack is used to store:
 States (of an automaton)
 Already read terminals and reduced nonterminals
 The automaton and the next input token decide the action to be taken
 There are three possible actions:
 shift: Read a token and put it on the stack (together with a
state)
 reduce: Convert some tokens and/or nonterminals on the stack to a
single nonterminal using a rewriting rule
(a reduce action is always followed by a goto to another state)
 accept: Stop processing and accept the input
Conflicts and Ambiguous Grammars
 When running
bison
, it may show some conflicts:
 shift/reduce conflicts: Both shift and reduce are possible; shift is
always choosen
 reduce/reduce conflicts: There is more than one way to reduce
bison
just chooses one of the selections:
 If this is the right selection, we are fine
(but we may want to fix the grammar anyway)
 If this is the wrong selection, we need to fix the grammar
 Grammar example:
E → E '' E  integer
For 5  3  7
, this grammar allows two
interpretations:
(53)  7
and 5  (37)
Another Example of Ambiguity
The grammar for if
else
is a famous example of
ambiguity:
if (...) if (...) ...; else ...;
can be parsed in two ways:
if (...) {
if (...) ...;
else ...;
}
or
if (...) {
if (...) ...;
}
else ...;
This creates a shiftreduce conflict.
The first way of parsing is correct (for C), and is choosen by
bison
because in a shiftreduce conflict, shift is selected.
Grammar of bison
Rewriting Rules (META!)
rewritingRule → nonterminalSymbol ":
" rightHandList
";
"
rightHandList → rightHand  rightHand "
" rightHandList
rightHand → symbolList "{
" CFragment "}
"
symbolList → symbol  symbol symbolList
symbol → nonterminalSymbol  terminalSymbol
How to Combine flex
and bison
 In the
.y
file, list all token types:
%token NUM PLUS ASTERISK
...
 In the
.y
file, define the type of the attributes
#define YYSTYPE int
 In the
.lex
file, define one or more rules for each token
type
 In the
.y
file, define the rewriting rules of the
grammar
 In the
.y
file, write the program fragments to calculate
attribute values
 Process (with flex/bison), compile, and test
Advantages and Problems of BottomUp Parsing
 Advantages:
 No problems with left recursion
 Wider range of grammars
 Automatic creation of parser
 Problems:
 Very hard to create parser by hand
 Ambiguities needs attention
Homework: A Calculator for Rational Numbers
Deadline: July 4, 2019 (Thursday in two weeks), 19:00
Prepare questions so that you can ask them in next week's lecture!
Where to submit: Box in front of room O529 (building O, 5th floor)
Format: Submit the files rationals.lex
and
rationals.y
(in this order!), A4 using BOTH sides (↓↓, not
↓↑), portrait (not landscape), stapled in upper left if more than one page,
NO cover page, NO wrapping lines, legible font size, nonproportional font,
formatted (indents,...) for easy visibility, name (kanji and kana) and student
number as a comment at the top right
Change the simple calculator of calc.y to a calculator that can handle
integers and rationals.
 Statements are separated with
;
.
 Rationals are expressed as
[numerator,
denominator]
.
 Inside
[]
, division is not allowed. Make sure this is
checked by the grammar.
 Nesting of
[]
is not allowed. Make sure this is checked by
the grammar.
 Calculations are exact, not using floating point numbers.
 Print out the result of each statement.
 Results are given as irreducible fractions, with a minus sign on the
numerator if applicable.
 Example result:
Result is 53/17
 Use the grammar to define priorities and associativities (do NOT use
%left
, %right
,...).
Bonus points (発展問題) for dealing with hours, months,
years,...
Hints for Homework
 Start from the calc files.
 Change the filenames (including in the
makefile
).
 Use
YYSTYPE
to define a type that can handle rationals and
integers (both in .lex
and .y
)
 Define additional tokens in
.y
(%token
rule)
 Write rules for additional tokens in
.lex
 If you have a shift/reduce or reduce/reduce conflict:
 Check the
.output
file
 Check using different inputs
 Expand your grammar little by little, always carefully testing
 Build up your own test file, and expand it together with expansions to
the grammar
 Create tests that check important aspects of the grammar (priorities,
associativity, errors)
 Save test outputs and use them for automatic comparison
Glossary
 unary (operator)
 単項 (演算子)
 leftmost derivation
 最左導出
 rightmost derivation
 最右導出
 reverse order
 逆順
 lookahead
 先読み
 nonproportional font
 等幅のフォント
 rational number
 有理数
 numerator
 分子
 denominator
 分母
 sign
 符号
 irreducible fraction
 既約分数