Importance, Definition, and Classification of Formal Languages

(形式言語の重要性、定義、分類)

2rd lecture, April 15, 2022

Language Theory and Compiler

http://www.sw.it.aoyama.ac.jp/2022/Compiler/lecture2.html

Martin J. Dürst

AGU

© 2005-22 Martin J. Dürst 青山学院大学

 

Today's Schedule

 

Seating

1  ☑口口 ☑口口 ☑口口 ☑口口 ☑口口 ☑口口
2  口口☑ 口口☑ 口口☑ 口口☑ 口口☑ 口口☑
3  ☑口口 ☑口口 ☑口口 ☑口口 ☑口口 ☑口口
4  口口☑ 口口☑ 口口☑ 口口☑ 口口☑ 口口☑
5  ☑口口 ☑口口 ☑口口 ☑口口 ☑口口 ☑口口
6  口口☑ 口口☑ 口口☑ 口口☑ 口口☑ 口口☑

 

Covid Precautions

 

Example Answers for Homework

Problem: For the one-line C program fragment below, based on the examples given in this lecture, write down:

  1. the result of lexical analysis
  2. the result of parsing
  3. the output of the compiler (in assembly language; comments are not needed; use SUB for substraction, and DIV for division)
grade = english - absent * 5 + math / 3;
Output of lexical analysis:
[removed]
Output of parsing:

Compiler output (other solutions possible):

[removed]

 

Course Contents


Theory Compilers Other applications
Front end language theory, automata lexical analysis, parsing regular expressions, text/data formats
Back end
optimization, code generation

 

Importance of Formal Language Theory

 

Today's Schedule

 

Basic Concept: Word

 

Definition of Word

 

Concatenation Operation on Words

 

Properties of Concatenation

 

Today's Schedule

 

Definition of Language

A language over Σ is a set of words over Σ

Examples for lanuages over Σ ={a, b, c}:

 

More Examples of Languages

 

Even More Examples of Languages

 

Operations on Languages

Operations on languages are combinations of operations on sets and operations on words.

  1. Set union on languages
  2. Set intersection on languages
  3. Set difference on langugages
  4. Concatenation operation for languages:
    For languages A and B, their concatenation AB is the set { wv | wA, vB }
    Example: A = { ab, abc }, B = { a, ca },
    AB = { aba, abca, abcca }    (|AB| ≦ |A| · |B|)
    As for words, we write L2 for LL, L1 for L,
    L0 for   {ε}, ...
  5. Kleene closure: Concatenating the same language 0 or more times
    written L*; L* = L0L1L2L3∪... = ⋃i=0 Li
    Example: L = {a, b}
    L* = {ε, a, b, aa, ab, ba, bb, aaa, ...}

 

 

Terms used for Natural Languages
and Formal Languages

Unit Smallest Unit Sequence Set Classification
natural language Japanese (単) 文、文書 (自然)言語

(大)語族、語族、語派、語群

English word sentence, text (natural) language

language macrofamily, family, group,...

formal language Japanese 記号 (文字など) (形式)言語 言語 (族)
English symbol (letter,...) word (formal) language language type,...

 

Main Problems in Formal Language Theory

 

Today's Schedule

 

Languages, Automata, and Grammars

 

Table of Formal Language Types

(Chomsky hierarchy)

言語 grammar Type Lanugage type automaton
句構造言語 phrase structure grammar (psg) 0 phrase structure language Turing machine
文脈依存言語 context-sensitive grammar (csg) 1 context-sensitive language linear-bounded automaton
文脈自由言語 context-free grammar (cfg) 2 context-free language push-down automaton
正規言語 regular grammar (rg) 3 regular language finite state automaton

 

Types of Automata

Automata types are distinguished by the restrictions on their "external memory":

0. The external memory is a tape of unlimited length: Turing machine

1. The external memory is a tape of limited length: linear-bounded automaton

2. The external memory is a stack (only the top can be accessed): push-down automaton

3. There is no external memory: finite state automaton

 

Example of a Grammar for a Formal Language

  1. S, B: nonterminal symbols (upper case)
  2. a, o, y: terminal symbols (lower case)
  3. rewriting rules:

    Sa S o

    SB

    By a

  4. S: start symbol (initial symbol)

Example of derivation of a word from the grammar:

Sa S oa a S o oa a B o o   a a y a o o

S  a a y a o o, other derivations:   a y a o,   aaayaooo, ...

(single steps in a derivation are written with →, the overall result with ⇒)

 

Definition of Grammar

The four components defining a grammar:

  1. A finite set of nonterminal symbols N (usually upper case)
  2. A finite set of terminal symbols Σ (usually lower case, NΣ = {})
  3. A finite set of rewriting rules P (also called production rules)
  4. A start symbol S (SN, the symbol on the left side of the first rewriting rule if not explicitly specified)

A grammar is a quadruple (N, Σ, P, S)

 

Rewriting Rule

(also: production rule)

 

How to Apply Rewriting Rules

 

Derivation

 

Example of Grammar and Derivation

Grammar:

  1. Sdcd
  2. SdHRd
  3. RGHRd
  4. RGHd
  5. HGGH
  6. dGdd
  7. Hdcd
  8. Hccc

Example of derivation:
S2 dHRd4 dHGHdd5 dGHHdd7 dGHcdd8 dGccdd6 ddccdd

(numbers indicate the rewriting rule that is applied, the underlined parts indicate where the rules are applied)

 

Summary of this Lecture

 

Homework Submission

Deadline: April 21, 2022 (Thursday), 18:40

Format: A4 single page (using both sides is okay; NO cover page), easily readable handwriting (NO printouts), name (kanji and kana) and student number at the top right

Where to submit: Box in front of room O-529 (building O, 5th floor)

 

Homework Problem 1

For the language L = { qt, sq, s },
list the 10 shortest words of L*.

Additional problem (solution voluntary):
List all words of L* of length 4.

 

Homework Problem 2

Using the grammar from the slide "Example of Grammar and Derivation", find 3 words (different from each other and from ddccdd) produced by that grammar.

Give the full derivation for each word
(rule numbers and underlines not needed).

Guess and explain
what language this grammar defines.

Hint: If your guess is not simple,
maybe you have made a mistake in the derivations.

Additional problem (solution voluntary):
Prove or justify your guess.

 

Homework Problem 3

(no need to submit, but contact me by e-mail if you have any problems)

Install cygwin on your notebook computer (detailled instructions with images).

Make sure that you select/install gcc (gcc-core), flex, bison, diff (diffutils), make, and m4 .

If you have an earlier cygwin installation, make sure to check/update.

 

Homework Returns

 

Glossary

word
derivation
導出
classification
分類
symbol
記号
empty word
空語
alphabet
アルファベット
(word/language) over Σ
Σ 上の (語・言語)
concatenation (operation)
連結 (演算)
associativity
結合性 (結合率が成立つこと)
neutral element
単位元
commutativity
可換性
prefectural government (building)
県庁
keyword
予約語
well-formed formula
整論理式
Kleene closure
クリーン閉包
rule
規則
type of language
言語族
Chomsky hierarchy
チョムスキー階層
phrase structure language
句構造言語
context-sensitive language
文脈依存言語
context-free language
文脈自由言語
regular language
正規言語
Turing machine
チューリング機械
linear-bounded automaton
線形束縛オートマトン
push-down automaton
プッシュダウンオートマトン
finite state automaton
有限オートマトン
external memory
外部メモリ
nonterminal symbol
非終端記号
upper case (letter)
大文字
lower case (letter)
小文字
terminal symbol
終端記号
rewriting rule/production rule
書き換え規則・生成規則
initial/start symbol
初期記号・開始記号
derivation
導出
quadruple
四字組
left-hand side
左辺
right-hand side
右辺
subsequence
部分列