# Error Processing, Intermediate Representation, Semantic Analysis

(エラー処理、中間表現、意味解析)

## Language Theory and Compilers

http://www.sw.it.aoyama.ac.jp/2018/Compiler/lecture11.html

### Martin J. Dürst

© 2005-18 Martin J. Dürst 青山学院大学

# Today's Schedule

• Summary and leftovers of last lecture
• Additional information and hints for homework
• Error processing
• Intermediate Representations
• Semantic analysis

# Summary of Last Lecture

• There are many different orders of derivation, in particular leftmost derivation and rightmost derivation
• Recursive descent parsing corresponds to leftmost derivation
• Bottom-up parsing (LALR parsing) uses the reverse order of rightmost derivation
• LALR parsing operations: shift/reduce/goto/accept
• shift pushes a token and a state onto the stack
• reduce converts a number of tokens and nonterminals at the top of the stack to a single nonterminal
• These operations can be checked in the `.output` file produced by `bison -v`
and when debugging by setting `#define YYDEBUG 1`
• For ambiguous grammars, `bison` reports shift/reduce and reduce/reduce conflicts

# Last Week's Homework

Complete calc.y so that with test.in as an input, it produces test.check

Caution: How to deal with unary `MINUS`

# Hints for Homework

Deadline: July 12, 2018 (Thursday in two weeks), 19:00

Expand the simple calculator of calc.y to a calculator for dates and time periods (numbers of days).

• (also check hints in last lecture)
• How to deal with multiple types (dates, number of days, integers)
1. Separate tokens (terminals) and non-terminals per type
→ Make `YYSTYPE` a `struct` that can represent dates, numbers of days, and integers (in both `.lex` and `.y`
→ Eliminate impossible calculations with the grammar (ex: addition of two dates)
2. Treat everything as a single type
→ Make `YYSTYPE` a `struct` that can distinguish dates, numbers of days, and integers (in both `.lex` and `.y`)
→ Check whether a calculation is possible in C
• Start from calc, but make sure you change the file names (incl. `makefile`A simple library for date calculations:
• A simple library for date calculations: date_arithmetic.c
• Example input: dates.in.txt; example output: dates.check.txt

(no need to submit)

Use `calc.output` and `#define YYDEBUG 1` to understand how `bison` works

# Processing Syntax Errors

• Why is error processing difficult
• Requirements for error processing
• Techniques for error processing

# Why is Error Processing Difficult

• To make programs shorter, 'unnecessary' symbols should be avoided
• For each correct program, there are many different erroneous programs
• It is difficult for programs to distinguish between errors easily made by humans and errors rarely made by humans
• Parsing is based on language theory, but there is not much theory for error processing

# Requirements for Error Processing

• Output error messages that are easy to understand
• Find as many actual errors as possible
• Avoid secondary errors
• Do not slow down processing of correct programs
• Do not make the compiler much more complicated

# Techniques for Error Processing

• Throw away tokens until finding a token that matches the grammar (panic mode)
• Try to add or exchange a small number of tokens
• Add productions to the grammar that catch errors (error productions)
• Search for the correct program closest to the input

# Error Processing in `bison`

• Rules may contain a special `error` token
Example:
`statement: ... SEMICOLON { ... }         | error SEMICOLON { yyerror("Statement error.\n"); yyerrok; }`
(`yyerror` is called automatically, therefore this call may be unnecessary)
• If there is an error, `bison` ignores all the tokens and nonterminals before the closest `error` token
• The tokens in the rule after the `error` token are also ignored

# Compilation Stages

1. Lexical analysis
2. Parsing (syntax analysis)
3. Semantic analysis
4. Optimization (or 5)
5. Code generation (or 4)

# Intermediate Representation: Symbol Table

• Functionality provided:
• Search of symbols
• Registration and removal (for local variables) of symbols
• Management of data for each symbol
• Main points:
• Frequent use, large number of symbols
→ efficiency is important
• The same symbol may be used for different things in different contexts
→ distinction by scope and type is important

# Data Stored by Symbol Table

• Kind of symbol (variable, argument, function, type,...)
• Locations of declarations/definition/use
• Type for variables, functions,...
• Scope where a symbol is visible (e.g. global, file, function, compound statement,...)
• For variables,...: Size (amount of memory needed)
• For functions, variables,...: (relative) address

# Example of Scope

```extern int a; // declaration only, global scope
static int b; // file scope
int f (int a) // f: file scope; a: function scope
{
int a;    // function scope
static int b; // function scope, but persists across function calls
while (...) {
int a;    // block scope
}
}```

# Intermediate Representation: Abstract Syntax Tree

How to construct an abstract syntax tree:
Create nodes of the syntax tree as attributes of the attributed grammar.

For example, rewrite this

`exp: exp '+' term { \$\$ = \$1 + \$3; }`

to this:

`exp: exp '+' term        { \$\$ = newnode(PLUS, \$1, \$3); }`

(`YYSTYPE` has to be changed)

Most parts of an abstract syntax tree are binary (two branches), but for some constructs (e.g. arguments of a function), special treatment is necessary.

(For very simple programming languages (e.g. Pascal) and simple architectures (e.g. stack machine), it is possible to create code during parsing and to avoid the creation of an abstract syntax tree.)

# Semantic Analysis

• Mainly analysis and processing of type information:
• Check whether types match
• If necessary, add automatic type conversion (to syntax tree)
• For C and similar languages, relatively easy
• For object-oriented languages, has to consider inheritance,...
• Some languages (e.g. Haskell) use type inference
• Timing: When abstract syntax tree is constructed or just before or during code generation

# Type Equivalence

There are different ways to define type equivalence:

• Same name, same type (simple, but inconvenient for user)
• Same components, same type (complicated)
• For object-oriented languages, many choices exist

Example for C: type-equivalence.c (does this program compile?)

• Haskell: Functional programming language with strong theoretical background
• Characteristics:
• No assignment (only initialization), no loops (only recursion)
• Lazy evaluation
• Type inference
• Example of type inference:
• Type of function `f`: ```(a, Char) -> (a, [Char])```
(function from a pair of `a` and `Char` to a pair of `a` and a list of `Char`s
• Type of function `g`: ```(Int, [b]) -> Int```
(function from a pair of `Int` and a list of`b`s to an `Int`)
• What is the type of the function `h(x) := g(f(x))`?

# Glossary

syntax error

secondary error

symbol table

scope

compound statement

inheritance

type inference

type equivalence

functional (programming) language

lazy evaluation