Principles of Bottom-Up Parsing

(上向き構文解析の原理)

9th lecture, June 3, 2016

Language Theory and Compilers

http://www.sw.it.aoyama.ac.jp/2016/Compiler/lecture9.html

Martin J. Dürst

AGU

© 2005-16 Martin J. Dürst 青山学院大学

AGU

Today's Schedule

Summary of Last Lecture

Problems with Top-Down Parsing

 

Additional Requirements for Grammars

Depending on parsing method, additional requirements become necessary:

⇒ A grammar that just produces/recognizes 'words' is not enough

 

Conclusion

→ Bottom-up parsing)

 

Top-Down and Bottom-Up Parsing

Top-down parsing:
The parse tree is constructed starting at the top (start symbol)
Bottom-up parsing:
The parse tree is constructed from the bottom (terminal symbols)
During parsing, there may be multiple small parse trees

 

Top-Down and Bottom-Up Parsing

Parsing Direction
Top-Down Bottom-Up
General method Backtracking Dynamic programming
(CYK Algorithm)
Widely used method Recursive descent LR parsing

 

bison Overview

 

Exercise: A simple Pocket Calculator

Files to start with: makefile, calc.y, calc.lex

 

Use of make

 

Example of bison Input Format

%{
#include <stdio.h>
#define YYSTYPE double
int yylex (void);
void yyerror (char const *);
%}
%token NUM PLUS

%%
statement: exp { printf ("Result is %g\n", $1); }
;

exp: exp PLUS exp { $$ = $1 + $3; }
| NUM { $$ = $1; }
;

%%
int main (void)
{ return yyparse (); }

void yyerror (char const *s)
{ fprintf (stderr, "%s\n", s); }

 

Structure of bison Input Format

bison 関係の宣言など
%{ C 言語の宣言など %}
%%
書換規則 { 実行文 (C 言語) }
書換規則 { 実行文 (C 言語) }
書換規則 { 実行文 (C 言語) }
%%
関数など (C 言語) 関数など (C 言語)

 

Structure of bison Input Format

Mixture of bison-specific directives and C program fragments

There are three main parts, separated by %%:

  1. Preparation/settings:
    Settings for using bison
    C #include statements, definition/initialization of global variables,...
    C parts have to be surrounded by %{ ... %}
  2. Rewriting rules and (in { ... }) program fragments that get executed when a rule is matched
    (the first nonterminal symbol is the start symbol)
  3. Rest of C program (functions,...)

Newlines and indentation can be significant

 

Example of bison Rewriting Rule

exp: exp PLUS exp { $$ = $1 + $3; }
   | NUM { $$ = $1; }
;

 

Attribute(d) Grammar

 

bison Manual

 

Hints for Developping bison Programs

 

Homework

Deadline: June 9, 2016 (Thursday), 19:00

Where to submit: Box in front of room O-529 (building O, 5th floor)

Complete calc.y so that with test.in as an input, it produces test.check

Submit calc.y only, A4 single page (using both sides is okay; NO cover page, staple in top left corner if more than one page is necessary), printout (no wrapping lines), name (kanji and kana) and student number in comment at the top right

If there are differences with newlines, make sure that all files use Unix line ending convention
(In Notepad2, choose File → Line Endings → Unix (LF))

 

Glossary

factor
因子
tabulator
タブ文字
attribute(d) grammar
属性文法