(有限オートマトンと線形文法)

http://www.sw.it.aoyama.ac.jp/2019/Compiler/lecture3.html

© 2005-19 Martin J. Dürst 青山学院大学

- Schedule for next week
- Homework from last lecture
- Grammar types
- Finite state automata
- Linear grammars
- Conversions

**May 15th (Wednesday, 1st period / 補講、、水曜日1限、E-202)**- Implementation of lexical analysis, use of tools for lexical
analysis

- May 17
- Applications of lexical analysis, exercises using tools for lexical
analysis

About makeup classes: The material in the makeup class is part of the final exam. If you have another makeup class at the same time, please inform the teacher today.

補講について: 補講の内容は期末試験の対象。補講が別の授業とかぶる場合には今日申し出ること。

Grammar types are distinguished by restrictions on rewriting rules:

0. No restrictions: *Phrase structure grammar*, (Chomsky) type 0
grammar

1. `α``A``β` →
`α``γ``β`, where `α` and `β`
are sequences of 0 or more (non)terminals, and `γ` is a sequence of 1
or more (non)terminals:

*Context-sensitive grammar*, (Chomsky) type 1 grammar

2. `A` → `γ`, where γ` is a sequence of 1 or
more (non)terminals:`

*Context-free grammar*, (Chomsky) type 2 grammar

3. `A` → `a``B` or `A`→ `a`
(alternative: `A` → `B``a` or `A`→
`a``):`

*Regular grammar*, (Chomsky) type 3 grammar

(for all types, `S` → `ε` is also allowed)

- The grammar can be changed to a context-sensitive grammar

by replacing the rule`DC`→`CD`with the four rules

`DC`→`QC`,`QC`→`QR`,`QR`→`CR`, and`CR`→`CD`. - Languages such as
`a`can be created with context-sensitive grammars, but not with context-free grammars.^{n}b^{n}a^{n} - This language is a context-sensitive language.
- This language is not a context-free language.

(no need to submit, but bring your note PC with you if you have problems)

On your notebook PC, install cygwin (detailled instructions
with screenshots).

Make sure you select/install all of **gcc**,
**flex**, **bison**, **diff**,
**make** and **m4**.

`flex`

, `bison`

, `gcc`

,...
InstallationTo check your installation of the various programs, start up a Cygwin Terminal session, and use the following commands to check the version of each software:

`flex -V`

(`V`

is upper case)`bison -V`

(`V`

is upper case)`gcc -v`

(`v`

is lower case)`diff -v`

(`v`

is lower case)`make -v`

(`v`

is lower case)`m4 --version`

grammar | type | lanugage type | automaton |

phrase structure grammar (psg) | 0 | phrase structure language | Turing machine |

context-sensitive grammar (csg) | 1 | context-sensitive language | linear-bounded automaton |

context-free grammar (cfg) | 2 | context-free language | push-down automaton |

regular grammar (rg) | 3 | regular language | finite state automaton |

Regular languages are used for lexical analysis.

- Finite state automata (FSA)
- Deterministic finite automaton (DFA)
- Non-deterministic finite automaton (NFA)

- Regular grammar
- Left linear grammar
- Right linear grammar

- [Regular expression]

These all are equivalent, and define/accept regular languages

(automaton (αὐτόματον) is Greek; plural: automata)

Finite state automata are often represented with a *state transition
diagram*

Arrow from outside: initial state

Circles: states

Double circles: accepting state(s)

Arrows with labels: transitions

- Start with initial state
- Repeatedly read one symbol of the input word,

and transition to the next state along the arrow with the corresponding label - If the automaton is in an accepting state at the end of the word,

then the word is accepted - If the automaton is not in an accepting state at the end of the word,

or if there is no label with the right symbol, then the word is not accepted - The number of states is finite (i.e. there is only limited memory)

- Accepting only a word with a single specific symbol
- Accepting words where the number of symbols is odd, or even, or when divided by 3, the reminder is 2,...
- Accepting words with a fixed sequence of symbols at the start
- Accepting words with a fixed sequence of symbols at the end
- Accepting words with a fixed sequence of symbols somewhere in the middle
- Accepting words meeting more than one condition, at the same time or one after the other, or one of more than one conditions

Finite state automata can also be represented with a *state transition
table*.

The state transition table for our example automaton is:

a | b | |
---|---|---|

→A | B | A |

B | C | A |

*C | C | A |

Leftmost column: state

Top row: input symbol

→: start state (first state if not otherwise indicated)

*: accepting state(s)

Table contents: state after transition

- A finite set of states
`Q`(circles in diagram; leftmost column in table) - A finite set of input symbols
`Σ`(arrow labels in diagram; top row in table) - A state transition function
`δ`(arrows with labels in diagram; contents of table) - An initial state (start state)
`q`_{0}∈`Q`(circle with arrow from outside in diagram; state with arrow in table) - A finite set of accepting (final) states
`F`⊆`Q`(double circles in diagram; states with asterisks in table)

A finite state automaton is defined as a quintuple (`Q`,
`Σ`, `δ`, `q`_{0}, `F`)

- An FSA where there is always only one transition for each input is called
a
*deterministic finite automaton*(or DFA) - Other FSAs are called
*nondeterministic finite automata*(or NFAs) - If there are more than one possible transitions from a state on a given
input symbol, then:

- All transitions are executed simultaneously (as a result, the automaton will be in multiple states)
- Further transitions also proceed alike (the number of occupied states may increase further)
- Where there are no transitions, a state occupation will disappear
- At the end of the input, the word is accepted if
**at least**one of the occupied states is an accepting state

(epsilon transition)

- In NFAs, there may also be some ε transitions
- ε transitions are executed "for free", i.e. without any corresponding input symbol
- ε transitions are executed immediately before starting, and immediately after the "ordinary" transitions
`ε`transitions may be executed in parallel or in succession- ε transitions increase the set of occupied states (rather than moving)
- Executing all possible
`ε`transitions is called`ε`closure

Deterministic (DFA) | Nondeterministic (NFA) | |

concurrently occupied states | one single state | multiple states (set of states) |

acceptance criterion | current state is accepting state | one of the occupied states is accepting state |

ε transition | prohibited | allowed |

type of transition function | δ: Q × Σ → Q |
δ: Q × (Σ ∪ {ε})
→ P(Q) |

(there are also NFAs without ε transition)

- NFAs look more complex and powerful than DFAs
- DFAs seem simpler to implement than NFAs
- Question: Are there languages that can be recognized by NFAs but not by DFAs?
- Question: Is it possible to convert a(ny) NFA to an equivalent DFA?

State transition table for the example NFA on an earlier slide:

ε |
0 | 1 | |
---|---|---|---|

→S |
{A} |
{} | {} |

A |
{} | {A,C} |
{B} |

B |
{} | {} | {A} |

*C |
{} | {} | {} |

- Algorithm principle:
- Each set of occupied states in the NFA becomes a state in the DFA
- The
`ε`closure of the start state of the NFA becomes the start state of the DFA - Any set of states of the NFA that contains at least one accepting state becomes an accepting state of the DFA

- All NFAs can be converted to equivalent DFAs
- All DFAs are (simple) NFAs
- Therefore, DFAs and NFAs have equivalent recognition power
- Implementing DFAs is very simple, but the size of the table needed may
grow

(worst case:`n`→ 2^{n}; most cases:`n`→ ~2`n`)

Rule Shape | Name |

A → cB |
right linear rule (nonterminal on the right) |

A → Bc |
left linear rule (nonterminal on the left) |

A → c |
constant rule |

A *left linear grammar* is a grammar only using left linear rules and
constant rules

A *right linear grammar* is a grammar only using right linear rules
and constant rules

(in both cases, a special rule `S` → `ε` is allowed)

Left linear grammars and right linear grammars are together called
*linear grammar*s (or *regular grammar*s)

(a grammar that contains both left linear rules and right linear rules is not a linear grammar, but a kind of context-free grammar)

Right linear grammars and NFAs correspond as follows (not considering
`ε` transitions):

- States correspond to nonterminal symbols
- The start state corresponds to the start symbol
- Transitions moving to an accepting state correspond to constant rules
- All transitions correspond to right linear rules

There is a similar correspondence for left linear grammars (imagine reading the input backwards)

A → aB | bA

B → bA | aC | a

C → bA | aC | a

From automaton to grammar:

- Convert all states to nonterminal symbols (start state→start symbol)
- Convert all transitions to right linear rules
- Convert all transitions to accepting states to constant rules

From grammar to automaton:

- Create a state for each nonterminal symbol (start symbol→start state)
- Convert all right linear rules to transitions
- Create a new state only used for acceptance, and convert all constant rules to transitions to this state

- Linear/regular grammars and finite state automata generate/recognize the same (class of) languages
- DFAs allow efficient inplementation of recognition of regular languages
- This can be used for lexical analysis

Callenge: Regular languages can be represented by state transition diagrams/tables of NFAs/DFAs, or with regular grammars, but a more compact representation is desirable

Deadline: May 14, 2019 (Tuesday!), 19:00

Where to submit: Box in front of room O-529 (building O, 5th floor)

Format: A4 single page (using both sides is okay; NO cover page), easily readable handwriting (NO printouts), name (kanji and kana) and student number at the top right

- Draw a state transition diagram for a finite state automaton that
recognizes all inputs that (at the same time)

- Start with
`a``b` - End with
`b``a` - Contain an even number of
`c` - Contain no other symbols

- Start with
- Draw the state transition diagram for the NFA in the state transition
table below

`ε`0 1 `→S`{ `B`}{ `C`}{ `A`}`A`{} { `B`}{ `B,``D`}`B`{ `}`{ `D`}{ `}``*C`{ `B`}{ `A`}{ `S``}``D`{} { `A`,`B`}{ `C`} - Create the state transition table of the DFA that is equivalent to the NFA in 2. (do not rename states)
- Check the versions of
`flex`

,`bison`

,`gcc`

,`make`

, and`m4`

that you installed (no need to submit, but bring your computer to the next lecture if you have a problem)

- Finite state automaton (FSA)
- 有限オートマトン
- deterministic finite automaton (DFA)
- 決定性有限オートマトン
- Non-deterministic finite automaton (NFA)
- 非決定性有限オートマトン
- (left/right) linear grammar
- (左・右) 線形文法
- regular grammar
- 正規文法
- state transition diagram
- 状態遷移図
- transition
- 遷移
- initial/start state
- 初期状態
- accepting/final state
- 受理状態
- accept
- 受理する
- finite
- 有限
- state transition table
- 状態遷移表
- state transition function
- 動作関数
- simultaneous(ly)
- 同時 (な・に)
- ε transition
- ε 遷移
- ε closure
- ε 閉包
- equivalence
- 同等性
- (left/right) linear rule
- (左・右) 線形規則
- constant rule
- 定数規則
- renaming (of states)
- 状態の書換え