Parsing Context-Free Languages


7th lecture, May 31, 2019

Language Theory and Compilers

Martin J. Dürst


© 2005-19 Martin J. Dürst 青山学院大学

Today's Schedule


Collection of Homework from Last Lecture

(bring to next lecture, will be collected)

For a programming language that you know (e.g. C, Java, Ruby,...), search for a grammar on the Web, print it out, and carefully study it.


Leftovers from Last Lecture


Summary of Last Lecture


Homework Due Yesterday

Example solution: xml.l


A Difficult Regular Expression: C Comment

For better readability, we will match /x x/, and use spaces

Try # Regular Expression Problem
1 /x .* x/ /xx/ /xx/ is matched as a single unit
2 /x [^x]* x/ /xxx/ is not matched
3 /x ([^x]|x[^/])* x/ /x xx/ /x x/ is matched as a single unit
4 /x ([^x]|x+[^/])* x/ same as above
5 /x ([^x]|x+[^/x])* x/ /x xx/ is not matched
6 /x ([^x]|x+[^/x])* x+/ Done!

Reference: Mastering Regular Expressions, Jeffrey E.F. Friedl, pp. 168,...


Homework from Last Lecture



Compound Statement for C

        : '{' '}'
        | '{' statement_list '}'
        | '{' declaration_list '}'
        | '{' declaration_list statement_list '}'

        : declaration
        | declaration_list declaration

        : statement
        | statement_list statement


Block for Java

Be careful to distinguish e.g. literal {} (appearing as such in Java) and grammar-level {} (indicating 0 or more repetitions)

     { BlockStatements }

     { BlockStatement }

     [ Identifier : ] Statement


Different Ways to Express a Grammar

Simple Grammar



Formal Rewriting Rules and BNF

There are many different ways to write a grammar:

  1. Simplest: Only a list of rewriting rules
  2. Connect all the right hand sides that have the same left hand side with |
    ⇒ No fundamental change from 1 (syntactic sugar)
  3. Add equivalent of ? in regular expressions (present/absent, often written with [...])
    ⇒ Can be rewritten as two different rules
  4. Add equivalent of * in regular expressions (0 or more repetitions, often written {...})
    ⇒ Can be rewritten with [] and list

Writing grammars as above is often called BNF (Backus-Naur Form), EBNF (Extended...), or ABNF (Augmented...),...


Rewriting Connected Right Hand Sides

A → B C | D E

A → B C
A → D E


Rewriting Presence/Absence

A → B [ C ] D

A → B C D
A → B D


Rewriting BNF Repetition

A → B { C } D

A → B [ CList ] D
CList → C | CList C


How to Create a Grammar

  1. Write down a simple example word of the language
  2. Convert the example to tokens types (result of lexical analysis)
  3. Give names to the various phenomena in the example (e.g.: ...expression, ...statement, etc.)
  4. Create draft rewriting rules
  5. Repeat 1.-4. with more difficult examples, check, and fix


Example of Grammar Creation


Goal of Parsing


Result of Parsing: Parse Tree and Abstract Syntax Tree

Parse tree (concrete syntax tree):
Abstract syntax tree:


Examples of Parse Tree and Abstract Syntax Tree


Parser Implementation: Top-Down or Bottom-Up

Top-down parsing:
Build the parse tree from the top (root, start symbol)
Bottom-up parsing:
Build the parse tree from the bottom (terminal symbols)
During parsing, there may be several (small) parse trees


Difficulty of Parsing

Very General Parsing Method

(Cocke–Younger–Kasami (CYK) algorithm)


Deadline: June 6, 2019 (Thursday), 19:00

Where to submit: Box in front of room O-529 (building O, 5th floor)

Format: A4 single page (using both sides is okay; NO cover page, staple in top left corner if more than one page is necessary), easily readable handwriting (NO printouts), name (kanji and kana) and student number at the top right

In the problems below, n, +, -, *, and / are terminal symbols. Any other letters are non-terminal symbols. n denotes an arbitrary number, and the other symbols denote the four basic arithmetic operations.

  1. For the three grammars below, construct all the possible parse trees for words of length 5. Find the grammar that allows all and only those parse trees that produce correct results.
    1. E → n | E - E
    2. E → n | n - E
    3. E → n | E - n
  2. Same as in problem 1 for the four grammars below:
    1. E → n | E + E | E * E
    2. E → n | E + n | E * n
    3. E → T | E + T; T → n | T * n
    4. E → T | E * T; T → n | T + n
  3. (bonus problem) Based on what you learned from problems 1 and 2, create a grammar that allows to correctly calculate expressions with the four arithmetic operations (without parentheses). Check this grammar with expressions of length 5.
  4. Bring your notebook computer to the next lecture



parse tree (concrete syntax tree)
abstract syntax tree
構文木 (抽象構文木)
syntactic sugar
top-down parsing
bottom-up parsing
pocket calculator
Chomsky normal form
Chomsky 標準形
four (basic) arithmethic operations