Importance, Definition, and Classification of Formal Languages

(形式言語の重要性、定義、分類)

2rd lecture, April 20, 2018

Language Theory and Compiler

https://www.sw.it.aoyama.ac.jp/2018/Compiler/lecture2.html

Martin J. Dürst

AGU

© 2006-18 Martin J. Dürst 青山学院大学

Today's Schedule

 

Example Answers for Homework

Problem: For the one-line C program fragment below, and based on the examples given in this lecture, write down:

  1. the result of lexical analysis
  2. the result of parsing
  3. the output of the compiler (in assumbly language; comments are not needed; use SUB for substraction, and DIV for division)
grade = math + english/2 - absent*10;
Output of lexical analysis:
都合により削除
Output of parsing:
都合により削除
Compiler output:
都合により削除

 

Course Contens


Theory Compilers Other applications
Front end language theory, automata lexical analysis, parsing regular expressions, text/data formats
Back end
optimization, code generation

 

Importance of Formal Language Theory

 

Terms used for Natural Languages and Formal Languages

Field Smallest Unit Sequence Set Classification
natural language Japanese (単) 文、文書 (自然)言語

(大)語族、語族、語派、語群

English word sentence, text (natural) language

language macrofamily, family, group,...

formal language Japanese 記号 (文字など) (形式)言語 言語 (族)
English symbol (letter,...) word (formal) language language type,...

 

Basic Terms

Terms for formal languages:

 

Definition of Word

 

Concatenation Operation for Words

 

Properties of Concatenation

 

Definition of Language

A language over Σ is a set of words over Σ

Examples for lanuages over Σ ={a,b,c}:

 

More Examples of Languages

 

Operations on Languages

Operations on languages are combinations of operations on sets and operations on words.

  1. Set union of languages
  2. Set intersection of languages
  3. Set difference of langugages
  4. Concatenation operation for languages:
    For languages A and B, their concatenation AB is the set { wv | wA, vB }
    Example: A = { ab, ca }, B = { a, bb }, AB = { aba, abbb, caa, cabb }

    As for words, we write L2 for LL,...

  5. Kleene closure: Concatenating the same language 0 or more times

    written L*; L* = L0L1L2L3∪... = ⋃i=0 Li

    Example: L = {a, b} => L* = {ε, a, b, aa, ab, ba, bb, aaa, ...}

 

Main Problems in Formal Language Theory

 

Languages and Automata and Grammars

 

Table of Formal Language Types

(Chomsky hierarchy)

文法 grammar Type Lanugage type automaton
句構造文法 phrase structure grammar (psg) 0 phrase structure language Turing machine
文脈依存文法 context-sensitive grammar (csg) 1 context-sensitive language linear-bounded automaton
文脈自由文法 context-free grammar (cfg) 2 context-free language push-down automaton
正規文法 regular grammar (rg) 3 regular language finite state automaton

 

Types of Automata

Automata types are distinguished by the restrictions on their "external memory":

0. The external memory is a tape of unlimited length: Turing machine

1. The external memory is a tape of limited length: linear-bounded automaton

2. The external memory is a stack (only the top can be accessed): push-down automaton

3. There is no external memory: finite state automaton

 

Example of a Grammar for a Formal Language

Example of derivation of a word from the grammar:

Sa S oa a S o oa a B o oa a y a o o

Sa a y a o o

(single steps in a derivation are written with →, the overall result with ⇒)

 

Definition of Grammar

A grammar is defined as a quadruple (N, Σ, P, S)

 

Rewriting Rule

(also: production rule)

 

Derivation

(derivation)

Example of Grammar and Derivation

Grammar:

  1. Saba
  2. SaDTa
  3. TCDTa
  4. TCDa
  5. DCCD
  6. aCaa
  7. Daba
  8. Dbbb

Example of derivation:
S2 aDTa4 aDCDaa5 aCDDaa7 aCDbaa8 aCbbaa6 aabbaa

(numbers indicate the rewriting rule that is applied, the underlined parts indicate where the rules are applied)

 

Homework

Deadline: April 26, 2018 (Thursday), 19:00

Where to submit: Box in front of room O-529 (building O, 5th floor)

Format: A4 single page (using both sides is okay; NO cover page), easily readable handwriting (NO printouts), name (kanji and kana) and student number at the top right

  1. For the language L = { a, cb, ac }, list the 10 shortest words of L*.
    Additional problem (solution voluntary): List all words of L* of length 4.
  2. Using the grammar from the slide "Example of Grammar and Derivation", find 3 words (different from each other and from aabbaa) produced by that grammar. Give the full derivation for each word (rule numbers and underlines not needed). Guess and explain what language this grammar defines (Hint: If your guess is not simple, maybe you have made a mistake in the derivations).
    Additional problem (solution voluntary): Prove or justify your guess.
  3. (no need to submit, but bring your notebook PC with you to the next lecture if you have any problems)
    Install cygwin on your notebook computer (detailled instructions with images). Make sure that you select/install gcc, flex, bison, diff, make, and m4. If you have an earlier cygwin installation, make sure to check/update.

 

Glossary

word
derivation
導出
classification
分類
symbol
記号
empty word
空語
alphabet
アルファベット
(word/language) over Σ
Σ 上の (語・言語)
concatenation (operation)
連結 (演算)
associativity
結合性 (結合率が成立つこと)
neutral element
単位元
commutativity
可換性
prefectural government (building)
県庁
keyword
予約語
well-formed formula
整論理式
Kleene closure
クリーン閉包
rule
規則
type of language
言語族
Chomsky hierarchy
チョムスキー階層
phrase structure language
句構造言語
context-sensitive language
文脈依存言語
context-free language
文脈自由言語
regular language
正規言語
Turing machine
チューリング機械
linear-bounded automaton
線形束縛オートマトン
push-down automaton
プッシュダウンオートマトン
finite state automaton
有限オートマトン
external memory
外部メモリ
nonterminal symbol
非終端記号
upper case (letter)
大文字
lower case (letter)
小文字
terminal symbol
終端記号
rewriting rule/production rule
書き換え規則・生成規則
initial/start symbol
初期記号・開始記号
derivation
導出
quadruple
四字組
left-hand side
左辺
right-hand side
右辺
subsequence
部分列