Finite State Automata and Linear Grammars

(有限オートマトンと線形文法)

Language Theory and Compilers

3rd lecture, May 10, 2019

http://www.sw.it.aoyama.ac.jp/2019/Compiler/lecture3.html

Martin J. Dürst

Today's Schedule

Schedule for next week
Homework from last lecture
Grammar types
Finite state automata
Linear grammars
Conversions

Schedule for Next Week

May 15th (Wednesday, 1st period / 補講、、水曜日1限、E-202): Implementation of lexical analysis, use of tools for lexical analysis
May 17: Applications of lexical analysis, exercises using tools for lexical analysis

About makeup classes: The material in the makeup class is part of the final exam. If you have another makeup class at the same time, please inform the teacher today.

補講について: 補講の内容は期末試験の対象。補講が別の授業とかぶる場合には今日申し出ること。

Types of Grammars

Grammar types are distinguished by restrictions on rewriting rules:

0. No restrictions: Phrase structure grammar, (Chomsky) type 0 grammar

1. αAβ → αγβ, where α and β are sequences of 0 or more (non)terminals, and γ is a sequence of 1 or more (non)terminals:
Context-sensitive grammar, (Chomsky) type 1 grammar

2. A → γ, where γ is a sequence of 1 or more (non)terminals:
Context-free grammar, (Chomsky) type 2 grammar

3. A → aB or A→ a (alternative: A → Ba or A→ a):
Regular grammar, (Chomsky) type 3 grammar

(for all types, S → ε is also allowed)

Remarks on Homework 2

The grammar can be changed to a context-sensitive grammar
by replacing the rule DC → CD with the four rules
DC → QC, QC → QR, QR → CR, and CR → CD.
Languages such as aⁿbⁿaⁿ can be created with context-sensitive grammars, but not with context-free grammars.
This language is a context-sensitive language.
This language is not a context-free language.

Cygwin Download and Installation

(no need to submit, but bring your note PC with you if you have problems)
On your notebook PC, install cygwin (detailled instructions with screenshots).
Make sure you select/install all of gcc, flex, bison, diff, make and m4.

Checking `flex`, `bison`, `gcc`,... Installation

To check your installation of the various programs, start up a Cygwin Terminal session, and use the following commands to check the version of each software:

flex -V (V is upper case)
bison -V (V is upper case)
gcc -v (v is lower case)
diff -v (v is lower case)
make -v (v is lower case)
m4 --version

Summary of Last Lecture

grammar	type	lanugage type	automaton
phrase structure grammar (psg)	0	phrase structure language	Turing machine
context-sensitive grammar (csg)	1	context-sensitive language	linear-bounded automaton
context-free grammar (cfg)	2	context-free language	push-down automaton
regular grammar (rg)	3	regular language	finite state automaton

Regular languages are used for lexical analysis.

Plan for this Lecture

Finite state automata (FSA)
- Deterministic finite automaton (DFA)
- Non-deterministic finite automaton (NFA)
Regular grammar
- Left linear grammar
- Right linear grammar
[Regular expression]

These all are equivalent, and define/accept regular languages

Finite State Automaton Example

(automaton (αὐτόματον) is Greek; plural: automata)

Finite state automata are often represented with a state transition diagram

有限オートマトンの状態遷移図

Arrow from outside: initial state
Circles: states
Double circles: accepting state(s)
Arrows with labels: transitions

Workings of a Finite State Automaton

Start with initial state
Repeatedly read one symbol of the input word,
and transition to the next state along the arrow with the corresponding label
If the automaton is in an accepting state at the end of the word,
then the word is accepted
If the automaton is not in an accepting state at the end of the word,
or if there is no label with the right symbol, then the word is not accepted
The number of states is finite (i.e. there is only limited memory)

Examples of Finite State Automata

Accepting only a word with a single specific symbol
Accepting words where the number of symbols is odd, or even, or when divided by 3, the reminder is 2,...
Accepting words with a fixed sequence of symbols at the start
Accepting words with a fixed sequence of symbols at the end
Accepting words with a fixed sequence of symbols somewhere in the middle
Accepting words meeting more than one condition, at the same time or one after the other, or one of more than one conditions

State Transition Tables

Finite state automata can also be represented with a state transition table.

The state transition table for our example automaton is:

	a	b
→A	B	A
B	C	A
*C	C	A

Leftmost column: state
Top row: input symbol
→: start state (first state if not otherwise indicated)
*: accepting state(s)
Table contents: state after transition

Formal Definition of FSAs

A finite set of states Q (circles in diagram; leftmost column in table)
A finite set of input symbols Σ (arrow labels in diagram; top row in table)
A state transition function δ (arrows with labels in diagram; contents of table)
An initial state (start state) q₀ ∈ Q (circle with arrow from outside in diagram; state with arrow in table)
A finite set of accepting (final) states F ⊆ Q (double circles in diagram; states with asterisks in table)

A finite state automaton is defined as a quintuple (Q, Σ, δ, q₀, F)

Nondeterministic Finite Automata

An FSA where there is always only one transition for each input is called a deterministic finite automaton (or DFA)
Other FSAs are called nondeterministic finite automata (or NFAs)
If there are more than one possible transitions from a state on a given input symbol, then:
- All transitions are executed simultaneously (as a result, the automaton will be in multiple states)
- Further transitions also proceed alike (the number of occupied states may increase further)
- Where there are no transitions, a state occupation will disappear
- At the end of the input, the word is accepted if at least one of the occupied states is an accepting state

ε Transition

(epsilon transition)

In NFAs, there may also be some ε transitions
ε transitions are executed "for free", i.e. without any corresponding input symbol
ε transitions are executed immediately before starting, and immediately after the "ordinary" transitions
ε transitions may be executed in parallel or in succession
ε transitions increase the set of occupied states (rather than moving)
Executing all possible ε transitions is called ε closure

Example of NFA

Comparing DFAs and NFAs

	Deterministic (DFA)	Nondeterministic (NFA)
concurrently occupied states	one single state	multiple states (set of states)
acceptance criterion	current state is accepting state	one of the occupied states is accepting state
ε transition	prohibited	allowed
type of transition function	`δ`: `Q` × `Σ` → `Q`	`δ`: `Q` × (`Σ` ∪ {`ε`}) → `P`(`Q`)

(there are also NFAs without ε transition)

Equivalence of DFA and NFA

NFAs look more complex and powerful than DFAs
DFAs seem simpler to implement than NFAs
Question: Are there languages that can be recognized by NFAs but not by DFAs?
Question: Is it possible to convert a(ny) NFA to an equivalent DFA?

Example of Conversion from NFA to DFA

State transition table for the example NFA on an earlier slide:

	`ε`	0	1
`→S`	{`A`}	{}	{}
`A`	{}	{`A`,`C`}	{`B`}
`B`	{}	{}	{`A`}
`*C`	{}	{}	{}

Conversion from an NFA to an Equivalent DFA

Algorithm principle:
- Each set of occupied states in the NFA becomes a state in the DFA
- The ε closure of the start state of the NFA becomes the start state of the DFA
- Any set of states of the NFA that contains at least one accepting state becomes an accepting state of the DFA
All NFAs can be converted to equivalent DFAs
All DFAs are (simple) NFAs
Therefore, DFAs and NFAs have equivalent recognition power
Implementing DFAs is very simple, but the size of the table needed may grow
(worst case: n → 2ⁿ; most cases: n → ~2n)

Linear Grammar

Simple Rewriting Rules
Rule Shape	Name
`A` → `cB`	right linear rule (nonterminal on the right)
`A` → `Bc`	left linear rule (nonterminal on the left)
`A` → `c`	constant rule

A left linear grammar is a grammar only using left linear rules and constant rules

A right linear grammar is a grammar only using right linear rules and constant rules

(in both cases, a special rule S → ε is allowed)

Left linear grammars and right linear grammars are together called linear grammars (or regular grammars)

(a grammar that contains both left linear rules and right linear rules is not a linear grammar, but a kind of context-free grammar)

(Right) Linear Grammars and FSAs

Right linear grammars and NFAs correspond as follows (not considering ε transitions):

States correspond to nonterminal symbols
The start state corresponds to the start symbol
Transitions moving to an accepting state correspond to constant rules
All transitions correspond to right linear rules

There is a similar correspondence for left linear grammars (imagine reading the input backwards)

Example of Linear Grammar and NFA

有限オートマトンの状態遷移図

A → aB | bA

B → bA | aC | a

C → bA | aC | a

Conversion between Right Linear Grammar and NFA

From automaton to grammar:

Convert all states to nonterminal symbols (start state→start symbol)
Convert all transitions to right linear rules
Convert all transitions to accepting states to constant rules

From grammar to automaton:

Create a state for each nonterminal symbol (start symbol→start state)
Convert all right linear rules to transitions
Create a new state only used for acceptance, and convert all constant rules to transitions to this state

Today's Summary

Linear/regular grammars and finite state automata generate/recognize the same (class of) languages
DFAs allow efficient inplementation of recognition of regular languages
This can be used for lexical analysis

Callenge: Regular languages can be represented by state transition diagrams/tables of NFAs/DFAs, or with regular grammars, but a more compact representation is desirable

Homework

Deadline: May 14, 2019 (Tuesday!), 19:00

Where to submit: Box in front of room O-529 (building O, 5th floor)

Format: A4 single page (using both sides is okay; NO cover page), easily readable handwriting (NO printouts), name (kanji and kana) and student number at the top right

Draw a state transition diagram for a finite state automaton that recognizes all inputs that (at the same time)
- Start with ab
- End with ba
- Contain an even number of c
- Contain no other symbols

Draw the state transition diagram for the NFA in the state transition table below

	`ε`	0	1
`→S`	{`B`}	{`C`}	{`A`}
`A`	{}	{`B`}	{`B,` `D`}
`B`	{}	{`D`}	{}
`*C`	{`B`}	{`A`}	{`S`}
`D`	{}	{`A`, `B`}	{`C`}

Create the state transition table of the DFA that is equivalent to the NFA in 2. (do not rename states)
Check the versions of flex, bison, gcc, make, and m4 that you installed (no need to submit, but bring your computer to the next lecture if you have a problem)

Glossary

Finite state automaton (FSA): 有限オートマトン
deterministic finite automaton (DFA): 決定性有限オートマトン
Non-deterministic finite automaton (NFA): 非決定性有限オートマトン
(left/right) linear grammar: (左・右) 線形文法
regular grammar: 正規文法
state transition diagram: 状態遷移図
transition: 遷移
initial/start state: 初期状態
accepting/final state: 受理状態
accept: 受理する
finite: 有限
state transition table: 状態遷移表
state transition function: 動作関数
simultaneous(ly): 同時 (な・に)
ε transition: ε 遷移
ε closure: ε 閉包
equivalence: 同等性
(left/right) linear rule: (左・右) 線形規則
constant rule: 定数規則
renaming (of states): 状態の書換え