Use of Tools for Parsing

(yacc 系ツールの原理)

Language Theory and Compilers

http://www.sw.it.aoyama.ac.jp/2016/Compiler/lecture10.html

Martin J. Dürst

© 2005-16 Martin J. Dürst 青山学院大学

Today's Schedule

• Summary of last lecture
• How `bison` works:
• Different orders of derivation
• LALR parsing
• How to read the `.output` file
• How to debug `bison` grammars

Summary of Last Lecture

• Creating a program with `bison` and `flex` needs many steps, so using `make` is important
• The input format for `bison` is very similar to the input format for `flex`, but there are also some differences
• `bison` uses attribute grammars to calculate the result of parsing
• The attributes are referenced as `\$\$`, `\$1`, `\$2`,... in the C program fragments
• Priority and associativity of operators are expressed in the rewriting rules of the grammar

How to Express Priorities

• Use a separate nonterminal symbol for each priority level
• Write the grammar starting with the lowest priority (outside)
• On the right hand side of the rewriting rule for a given priority's nonterminal,
use nonterminals of the same ore one level higher priority
• How to select names for nonterminals:
• Use mathematical terms (用語): term (項), factor (因子)
• Use the type of operator, or one representative operator
(shift_expression, mulExpression,...)

How to Express Associativity

• Left associative:
• Use the same nonterminal on the left hand side of the rewriting rule and on the right hand side to the left of the operator
• Use the one level higher priority nonterminal to the right of the operator
• (unless this operator is required) Also create a rewriting rule with just the one level higher nonterminal as the right hand side
• Right associative:
• Exchange the nonterminals to the right and the left of the operator

How to Express Repetition (Lists)

• Example: A list of statements (`statementList` or `statements`)
• Two rewriting rules are needed
• Base rule:
• Repetition of 0 or more times: A rewriting rule with an empty right hand side
• Repetition of 1 or more times: A rewriting rule with a single element (e.g. `statement`) of the list on the right hand side
• Inductive rule:
• Right hand side uses both list and single element nonterminals
• If associativity is important, it determines the order of the two nonterminals
• If associativity is not important, there are two choices:
• List first: Left recursion; advantage: smaller stack
• Element first: Right recursion
• Examples:
• 0 or more times:
``things : {}       | things thing {};``
• 1 or more times:
```things : thing {} | things thing {};```

Order of Derivation: Leftmost and Rightmost Derivation

With leftmost derivation, always the leftmost nonterminal in the syntax tree is expanded

With rightmost derivation, always the rightmost nonterminal in the syntax tree is expanded

Simple example grammar:

`E → E '+' T````| T T → integer```

Example of input: `5 + 7 + 3`

Derivation Choices

Different choices may:

• Generate different words: This is necessary to be able to process different inputs
• Generate the same word, but with different syntax trees: Ambiguous grammar, needs to be avoided
• Generate the same syntax tree, but in different orders (leftmost/rightmost/... derivation): Different parsing algorithm

Kinds of Analysis Methods

• LL: Read input from the left, use leftmost derivation (used in top-down parsing)
• LR: Read input from the right, use rightmost derivation (in reverse order)
• LL(1): LL, with one token lookahead
• LR(1): LR, with one token lookahead
• LALR: A kind of LR (1), used widely in `yacc` and `bison`

The labels are also used for grammars:
"This grammar is LL(1)" (meaning: this grammar can be used with an LL(1) parser)

How to Observe and Debug `bison`

• `bison -v` creates a file with many interesting details (calc.y → calc.output)
• `#define YYDEBUG 1` switches on debugging

Understanding `bison`: The `.output` File

`bison -v` creates a file with extension `.output`, containing the following interesting details:

• [Problems: Unused terminal symbols, conflicts]
• Grammar: Numbered rewriting rules; rule number 0 is ```\$accept: start symbol \$end```)
• Terminals: Numbered terminal symbols; numbers are ASCII codes or >256)
• Nonterminals, with numbers of rules where they appear
• States, with the following information for each state:
• Rewriting rules (`.` shows current position)
• Terminal symbols (or \$default) and the action (shift, reduce, goto) if this symbol is the next symbol in the input
• Nonterminal symbols: goal state of transition after reduction

Understanding `bison`: Debuging

`#define YYDEBUG 1` switches on debugging

The output shows how `bison` works:

• A (pushdown) stack is used to store:
• States (of an automaton)
• The automaton and the next input token decide the action to be taken
• There are three possible actions:
• shift: Read a token and put it on the stack (together with a state)
• reduce: Convert some tokens and/or nonterminals on the stack to a single nonterminal using a rewriting rule
(a reduce action is always followed by a goto to another state)
• accept: Stop processing and accept the input

Conflicts and Ambiguous Grammars

• When running `bison`, it may show some conflicts:
• shift/reduce conflicts: Both shift and reduce are possible
• reduce/reduce conflicts: There is more than one way to reduce
• `bison` just chooses one of the selections:
• If this is the right selection, we may be fine
(but we may want to fix the grammar anyway)
• If this is the wrong selection, we have to fix the grammar
• Grammar example: `E → E '-' E | integer`
• For `5 - 3 - 7`, this grammar allows two interpretations: `(5-3) - 7` and `5 - (3-7)`

Grammar of `bison` Rewriting Rules

rewritingRule → nonterminalSymbol "`:`" rightHandList "`;`"
rightHandList → rightHand | rightHand "`|`" rightHandList
rightHand → symbolList "`{`" CFragment "`}`"
symbolList → symbol | symbol symbolList
symbol → nonterminalSimbol | terminalSymbol

How to Combine `flex` and `bison`

• In the `.y` file, list all token types:
`%token NUM PLUS ASTERISK` ...
• In the `.y` file, define the type of the attributes
`#define YYSTYPE int`
• In the `.lex` file, define one or more rules for each token type
• In the `.y` file, define the rewriting rules of the grammar
• In the `.y` file, write the program fragments to calculate attribute values
• Process (with flex/bison), compile, and test

Advantages and Problems of Bottom-Up Parsing

• No fear of left recursion
• Wider range of grammars
• Automatic creation of parser
• Problems:
• Very hard to create parser by hand
• Ambiguity needs attention

Homework

Deadline: June 23, 2016 (Thursday in two weeks), 19:00

Prepare questions so that you can ask them in next week's lecture!

Expand the simple calculator of calc.y to a calculator for complex numbers. Immaginary numbers are expressed as `5i`, complex numbers as `[realPart, immaginaryPart]`. Design your grammar so that inside `[]`, real number calculations are allowed, but immaginary or complex numbers (e.g. `5i`) are disallowed. Example of input: test.in

Express priorities and associativity directly in the grammar (`%left`, `%right`,... are forbidden).

Where to submit: Box in front of room O-529 (building O, 5th floor)

Submit the files `complex.lex` and `complex.y`, A4 using BOTH sides (↓↓, not ↓↑); NO cover page, staple in top left corner for more than one page, printout (non-proportional font, no wrapping lines), name (kanji and kana) and student number in comment at the top right of the first page.

Hints for Homework

• Start from the calc files, but change the file names (including in the `makefile`)
• Change `YYSTYPE` so that it can represent a complex number (in both `.lex`and `.y`)
• Define additional tokens in `.y`
• Write rules for additional tokens in `.lex`
• Convert processing of numbers in `.lex` so that it works with floating point numbers
• If you have a shift/reduce or reduce/reduce conflict:
• Check the `.output` file
• Check using different inputs
• Expand your grammar little by little, always carefully testing
• Build up your own test file, and expand it together with expansions to the grammar
• Create tests that check important aspects of the grammar (priorities, associativity, errors)
• Save test outputs and use for automatic comparison

Glossary

unary (operator)

leftmost derivation

rightmost derivation

reverse order