Principles of top-down parsing
(下向き構文解析の実装) 
8th lecture, June 7, 2019 
Language Theory and Compilers 
http://www.sw.it.aoyama.ac.jp/2019/Compiler/lecture8.html
Martin J. Dürst

© 2005-19 Martin
J. Dürst 青山学院大学
Today's Schedule
  - Leftovers, summary, and homework for last lecture
 
  - Top-down parsing
 
  - Recursive descent parsing
 
  - Implementation of recursive descent parsing
 
  - How to deal with various problems in a grammar:
    
      - Limitation of number of operations
 
      - Priority
 
      - Left/right associativity
 
      - Left recursion
 
    
   
 
Leftovers from Last Lecture
 
Summary of Last Lecture
  flex homework: Lexical analysis/regular expressions for
  C 
  - Regular expression for C comments
 
  - Grammars for various languages
 
  - How to construct a grammar
 
  - Results of parsing: Parse tree and abstract syntax tree
 
 
Comment about Grammars Collected Last Week
  - Many students submitted descriptions of syntax, not grammars
 
  - These students received less points
 
 
Last Week's Homework 1
都合により削除
 
Last Week's Homework 2
都合により削除
 
Last Week's Homework 3
都合により削除
 
About Ambiguous Grammars
  - Grammars that may produce more than one parse tree for the same input are
    called ambiguous grammars
 
  - Sometimes, it may seem okay to allow an ambiguous grammar (e.g.
    mathematically, a+(b+c) =
    (a+b)+c). But this will not work for
    overflows, and it will not work for non-commutative operators (e.g.
    a-(b-c) ≠
    (a-b)-c).
 
  - Some grammars can be changed to remove ambiguity
 
  - There is no general algorithm to remove ambiguity; there is also no
    algorithm to decide whether removal is possible for a given grammar
 
  - Ambiguous grammars are okay when only defining the syntax of a language,
    but are not suited for a programming language or a data format
 
  - Whether a grammar is ambiguous and whether its language requires a
    nondeterministic pushdown automaton for recognition are separate
    problems
    (example: palindrome) 
 
General Top-Down Parsing
  - Create parse tree starting with the start symbol of the grammar
 
  - Expand parse tree depth-first from the left
 
  - If there is a choice (of rewriting rules), try each one in turn
   
  - Once the parse tree reaches a terminal symbol, compare with input
    
      - If there is a match, continue
 
      - If there is no match, give up and try another choice
      (backtracking)
 
    
   
 
Main Points of Backtracking
Backtracking tries all possible pathways (similar to finding exit in a
labyrinth without map)
Backtracking may be very slow, but this can be improved:
  - Change the grammar so that backtracking is reduced or eliminated
    (ideally, the next token should be enough to select a single rewriting
  rule) 
  - Use lookahead (check some more tokens) to eliminate some
  choices
 
  - Remember intermediate results (packrat parser)
    
      - Similar to making marks in a labyrinth
 
      - Example implementation: treetop (for Ruby)
 
    
   
 
Recursive Descent Parsing
  - Implementation of top-down parsing, easy to write by hand
 
  - Create a function for each non-terminal symbol of the grammar
 
  - In the function, proceed along the right side of the rewriting rule(s):
    
      - For a terminal symbol, compare with input
 
      - For a non-terminal symbol, call the corresponding function
 
      - Use branching (
if,...) for a choice (|) in
        the grammar 
      - Use repetition (
while,...) for repetition in BNF  
    
   
  - Reason for name: Recursive grammar rules
    Example: A → variable '=' A | integer (assignment
    expression)
   
 
Recursive Descent Parsing: Simple Hand-Written Parser
Program files: scanner.h, scanner.c, parser1.c
How to complie: gcc scanner.c parser.c && ./a
 
Details of Recursive Descent Parsing: Lexical Analysis
(see scanner.c)
  - Invariant:
    
      - The next character is always in 
nextChar (one-character
        lookahead) 
      - As soon as a character is processed, the next character is read into
        
nextChar 
    
   
  nextChar is a global variable (can be changed to a function
    parameter) 
  - How to use from parser:
    
      - Initialize with 
initScanner 
      - Read tokens with 
getNextToken 
    
   
  - Implementation of 
getNextToken:
    
      - One-character tokens: Direct decision
 
      - Multiple-character tokens: Decide on first character, read the rest
        with a dedicated function
 
    
   
 
Details of Recursive Descent Parsing: Parsing
(see parser1.c)
  - Invariant (same as for lexical analysis):
    
      - The next token to be looked at is always in 
nextToken
        (one-token lookahead) 
      - As soon as a token is processed, the next token is read into
        
nextToken 
    
   
  nextToken is a global variable, but this can be changed to a
    function parameter 
  - Overall usage:
    
      - Initialize using 
initScanner and
        getNextToken 
      - Call the function corresponding to the start symbol of the grammar
        (e.g. 
Expression()) 
      - Further process the returned value (abstract syntax tree or result of
        evaluation)
 
    
   
 
Details of Recursive Descent Parsing: Non-Terminal Symbols
  - Create a function for each non-terminal symbol of the grammar
 
  - In the function, deal with all rewriting rules for this non-terminal
    symbol
 
  - For each non-terminal on the right-hand side, call the corresponding
    function
 
  - For each terminal on the right-hand side, compare with
    
nextToken 
 
How to Deal with Left Recursion
Example of left recursion:
E → E '-' integer | integer
Wrong solution (change of associativity):
E → integer '-' E | integer
Correct solution:
E → integer Econtinued
Econtinued → '-' integer Econtinued | ε
In (E)BNF:
E → integer {'-' integer}
 
Differences between Grammars and Regular Expressions
Grammar:
  - Multiple rules
 
  - Non-terminal symbols, derivation from left-hand side to right-hand
  side
 
  - For simple grammars, *, (), and | are not available
 
Regular Expression:
  - Limited to one single rule
 
  - No non-terminal symbols, only right-hand side
 
  - For practical regular expressions, lots of metacharacters/functionality
    besides *, (), and | available
 
A simple regular expression corresponds to a single rewriting rule in an
(BNF,...) grammar
 
Homework
Deadline: June 21, 2017 (Thursday) June 13, 2019 (Thursday), 19:00
Where to submit: Box in front of room O-529 (building O, 5th floor)
Format: A4 double-sided printout of parser program. Stapled in upper left if
more than one page, no cover page, no wrapping lines, legible font size,
non-proportional font, portrait (not landscape), formatted (indents,...) for
easy visibility, name (kanji and kana) and student number as a comment at the
top
Collaboration: The same rules as for Computer Practice I (計算機実習 I)
apply
  - Expand the top-down parser of parser1.c
    to correctly deal with the four basic arithmetic operations.
    (scanner.h/c do not change, so no need to submit
    them) 
  - (bonus problem) Add more operations to the top-down parser, and/or deal
    with parentheses.
    (If you solve this problem, also submit the
    scanner.h/c files, but only one
    parser.c file for both problems.) 
  - Bring your notebook computer to the next lecture. Check again that
    
flex, bison, make, and
    gcc are installed. 
 
Glossary
  - ambiguous grammar
 
    - 曖昧な文法
 
  - recursive descent parsing
 
    - 再帰的下向き構文解析
 
  - depth-first
 
    - 深さ優先
 
  - lookahead
 
    - 先読み
 
  - backtracking
 
    - バックトラック
 
  - labyrinth
 
    - 迷路
 
  - right associative
 
    - 右結合
 
  - invariant
 
    - 不変条件
 
  - left recursion
 
    - 左再帰