Regular Expressions


4th lecture, May 15, 2019

Language Theory and Compilers

Martin J. Dürst


© 2005-19 Martin J. Dürst 青山学院大学

Today's Schedule


Last Week's Homework 4



Last Week's Homework 1



Last Week's Homework 2



Last Week's Homework 3



Leftovers from Previous Lecture


Today's Outlook

Summary from last time:

Callenge: Regular languages can be represented by state transition diagrams/tables of NFAs/DFAs, or with regular grammars, but a more compact representation is desirable.

There is a very powerful way to represent regular languages, called regular expressions


Minimization of DFAs

To create the smallest DFA equivalent to a given DFA:

Overall idea: work backwards

  1. Separate states into two sets, accepting states and non-accepting states
  2. For each state, check which other states are reached for each input symbol
  3. Partition each set of states into sets that can reach the same set with the same input symobls
  4. Repeat 2. and 3. until there is no further change

Purpose of minimization:


Example of DFA Minimization


Efficient Implementation of a DFA

State   next_state[state_count][symbol_count]; /* state transition table */
Boolean final_state[state_count];              /* final state? */
State   current_state = start_state;
Symbol  next_symbol;

while ((next_symbol=getchar()) != EOF &&       /* end of input */
         current_state != no_state)            /* dead end */
    current_state = next_state[current_state][next_symbol];
if (final_state[current_state])
    printf("Input accepted!");
    printf("Input not accepted!");


Application of Regular Expressions

Problem 04C1 of Computer Practice I: Convert &amp;, &quot;, &apos;, &lt; and &gt; in the input to &, ", ', <, and >, respectively.

One way to write this in Ruby:

gsub /&quot;/, '"'
gsub /&apos;/, "'"
gsub /&lt;/, '<'
gsub /&gt;/, '>'
gsub /&amp;/, '&'

gsub replaces all occurrences of a give pattern in a string

// are the delimiters for regular expressions (in Ruby, Perl, JavaScript,...)

Regular expressions match some input.


Regular Expressions


Regular Expressions


(Theoretical) Regular Expression:
Basic Syntax


More Examples of Regular Expressions


Why Regular Expressions?


Notation of Regular Expressions


Formal Definition of Regular Expressions

Theoretical Regular Expressions over Alphabet Σ
Priority Regular Expression Condition Defined Language Notes

ε, a a ∈ Σ {ε} or {a} literals
very high (r) r is a regular expression L((r)) = L(r) grouping
high r* r is a regular expression L(r*) = (L(r))* Kleene closure
low rs r, s are regular expressions L(rs) = L(r)L(s) concatenation
very low r|s r, s are regular expressions L(r|s) = L(r) ∪ L(s) set union

L(r) is the language defined by regular expression r


Caution: Priority

Make sure you understand the difference between the following pairs of regular expressions:


Grammar for Regular Expressions


Regular Expression to NFA


Regular Expression to NFA: Symbols, Alternatives

The NFA for a symbol a has a start state and an accepting state, connected with a single arrow labeled a (same for ε)

The NFA for r|s is constructed from the NFAs for r and s as follows:

全体の初期状態から r と s の初期状態へと、r と s の受理状態から全体の受理状態へ ε で結ぶ

The additional ε connections are necessary to clearly commit to either r or s.


Regular Expression to NFA: Concatenation, Repetition

The NFA for the regular expression rs connects the accepting state of r with the start state s through an ε transition. The overall start state is the start state of r; the overall accepting state is the accepting state of s.

The NFA for r* is constructed as follows:

全体の初期状態と r の初期状態、r の受理状態と全体の受理状態、全体の初期状態と全体の受理状態、そして r の受理状態と初期状態 (逆!) を ε で結ぶ。


Example of Conversion

Regular expression: a|b*c

In some cases, some of the ε transitions may be eliminated, or the NFA may otherwise be simplified.


From FSA to Regular Expression

Algorithmic conversion is possible, but complicated

General procedure:

  1. Create regular expressions for getting from state A to state B directly for all pairs of states
  2. Select a single state, and create all regular expressions that pass through this intermediate state
  3. Repeat step 2., increasing the number of intermediate states
  4. Simplify intermediate regular expressions as much as possible (they can get quite complex)

When understanding what language the FSA accepts, it is often easy for humans to create a regular expression for this language.


Applications of Regular Expressions


Practical Regular Expressions:
Notational Differences

Practical regular expressions have many additional functions and shortcut notations
(the corresponding theoretical regular expressions or simpler constructs are given in parentheses)


Practical Regular Expressions:
Usage Differences


Use of Practical Regular Expressions


Notes on Practical Regular Expressions


Theoretical vs. Practical Regular Expressions

Theoretical Practical
Meta-characters * | ( ) |*+?()[]{}.\^$
ε yes no
character classes ([]) no yes
+, ?, {} quantifiers no yes
^, $ anchors no yes
match where full word part of a string


Summary of this Lecture



Deadline: May 21, 2018 (Thuesday!), 19:00

Where to submit: Box in front of room O-529 (building O, 5th floor)

Format: A4 single page (using both sides is okay; NO cover page), easily readable handwriting (NO printouts), name (kanji and kana) and student number at the top right

  1. Construct the state transition diagram for the NFA corresponding to the following grammar
    S → εA | bB | cB | cC, A → bC | aD | a | cS, B → aD | aC | bB | a, C →εA | aD | a
    (Caution: In right linear grammars, ε is not allowed except in the rule S → ε)
    (Hint: Create a new accepting state F)
  2. Convert the following transition table to a right linear grammar

        0         1    
    →T     G H
    *G K L
    *H M K
    *K K K
    *L M K
    M L -
  3. Construct the state transition diagram for the regular expression ab|c*d
    (write down both the result of the procedure explained during this lecture (with all ε transitions) as well as a version that is as simple as possible)
  4. Bring your notebook PC (with flex, bison, gcc, make, diff, and m4 installed and usable)



regular expression
同型 (同形) の
theoretical regular expressions
論理的 (な) 正規表現
practical regular expressions
実用的 (な) 正規表現
表記 (上の)