For instance, if the description of the language is given
as the regular expression (1101)*
,
and the string we are given is 11110111011101
,
then proceeding from left to right we know the string is
not valid as soon as we reach the third character.
Memory requirements: one interesting aspect of a language is the amount of information we have to remember about a string while we are trying to determine whether or not it is a valid member of the language.
For the (1101)*
language we only need to
remember one of the following five things at any given point in time:
(1101)*)
-- this includes
our initial state, when we haven't seen any characters yet so
(1101)0
would be a valid interpretation, or
(1101)*1
, so may still turn out to be valid, or
(1101)*11
, so may still turn out to be valid, or
(1101)*110
, so may still turn out to be valid, or
Using these five potential states, we could design an abstract machine capable of recognizing whether or not a particular string was in the language.
We could, theoretically, design such a machine that had one state for every different string in the language, but for an infinite language that would require an infinite number of things to remember (i.e. an infinite number of states).
A useful property of regular languages is that they are precisely the set of languages that can be recognized by such an abstract machine with a finite number of states -- i.e. for every regular language we can design a recognizer without having to remember every string in the language.
The abstract recognizing machines we are discussing will be termed finite automata, and are the subject of our next section.
§
to represent the transition
function for a finite automata, taking you from a state and character
to a new state. Thus § Q × ∑ -> Q
M = (Q, ∑, q0,
A, §)
be a finite automata. A string
x
in ∑*
is accepted by M
if
§*
is in A
.
(I.e. by repeatedly applying the transition function to
the characters of x
, we are finally left in
an accept state.)
The language accepted by M
, or the
language recognized by M
, is the set
L(M) = { x ∈ ∑*
| x is accepted by M }
If L
is any language over ∑
,
L
is accepted, or recognized, by M
if and
only if L = L(M)
(I.e. to accept (or recognize) a language L
, a FA
must accepta all the strings in L
and reject all
the strings in L'
.)
L
over the alphabet
∑
is regular if and only if
there is a FA that accepts L
.
From a practical viewpoint, we want two strings, x, y
,
to be treated differently (i.e. distinguishable) if we can follow
each of them with the same substring, z
, and wind up
with one string in a language, L
, and the other string
in L'
Example: over alphabet { 0, 1 }
L
is (1010)*
x = 101
and y = 10
z
such that both
xz
and yz
are in L
,
so strings x
and y
are said
to be distinguishable with respect to L
x = 101
and
y = 1010101
, then for any substring,
z
,xz
is in L
iff yz
is in L
,
so x
and y
are said to be
indistinguishable with respect to L
L
is a language over the alphabet
∑
and
M = (Q, ∑, q0, A, §)
is any FA recognizing L
. If x
and
y
are any two strings over the alphabet for which
§*(q0,x) =
§*(q0,y)
then x
and y
are indistinguishable
with respect to L
.
Proof:
z
be any string over the alphabet
xz
and yz
q
in
Q
,
§*(q,xy) = §*(
§*(q,x),y)
§*(q0,xz) =
§*(§*(
q0,x),z)
§*(q0,yz) =
§*(§*(
q0,y),z)
§*(q0,x) =
§*(q0,y)
(the "if" part of our Lemma)
§*(q0,xz) =
§*(q0,yz)
M
is assumed to recognize L
,
either the two strings, xz, yz
, must either
both be in L
or they must both be in L'
x
and y
are (by our definition)
indistinguishable with respect to L
L
is a language over the alphabet
∑
, and for some positive integer,
n
, there are n
strings over the
alphabet, any two of which are distinguishable with respect to
L
.
Then there can be no FA recognizing L
with fewer than
n
states.
Proof by contradiction:
x1, ..., xn
are n
pairwise distinguishable strings
with respect to L
M
is any FA with fewer than n
states
§*(q0,x1),
§*(q0,x2),
...
§*(q0,xn)
cannot all be distinct
i ¬= j
,
§*(q0,xi) =
§*(q0,xj)
xi
and xj
were defined to be distinguishable with respect to L
,
it follows that M
cannot recognize L
This is important, in that if we can find n
strings
that are pairwise distinguishable, we know that any FA for the
language must have at least n
states.
{ 0, 1 }
is not regular.
Proof:
L
x1 = 01 x2 = 001 x3 = 0001 x4 = 00001 ...
To distinguish any two strings xi, xj
where i ¬= j
, use the string
10j
,
this will result in the two strings
0i110j
(which is not in
L
) and 0j110j
(which is in L
)
L
must be infinite
L
Suppose that
M1 = (Q1, ∑1,
q1, A1, §1
accepts L1
, and
M2 = (Q2,
accepts S2,
q2, A2, §2L2
.
Let
Then
Intuitive argument:
Example: Suppose that, over the alphabet
From the FAs which recognize
Proof of theorem 3.4:
M = (Q, ∑, q0, A, §)
where Q = Q1 × Q2
and
q0 = (q1,q2)
and
§((p,q),a) = (§1(p,a),
§2(q,a))
for any p
in Q1,
q
in Q2,
and
a
in ∑
.
A = { (p,q) | p ∈ A1
or q ∈ A2 }
then M
accepts the language
L1 U L2
A = { (p,q) | p ∈ A1
and q ∈ A2 }
then M
accepts the language
L1 @ L2
A = { (p,q) | p ∈ A1
and q ∉ A2 }
then M
accepts the language
L1 - L2
M
simply needs to be able to track which state the string
would put us in for machine M1
and which
state the string would put us in for machine
M2
M1
or M2
is in an accept state.
M1
and M2
are in an accept state.
M1
is in an accept state and
M2
is not in an accept state.
{ 0, 1 }
, we have the two regular languages
L1 = { x | 00 is not a substring of x }
,
L2 = { x | x ends with 01 }
.
L1
and L2
, construct an FA which
recognizes L1 - L2
L1
State Symbol Next State
X 0 Y
X 1 X
Y 0 Z
Y 1 X
Z 0 Z
Z 1 Z X
is the start state and
X, Y
are the accept states.
L2
State Symbol Next State
T 0 V
T 1 T
V 0 V
V 1 W
W 0 V
W 1 T T
is the start state and
W
is the accept state.
L1 - L2
XT, XV, XW, YT, YV, YW, ZT, ZV, ZW
XT
XT
,
if we observe a 1 we would
stay in state XT
, while on a 0
we would move to state YV
YV
on 0 we move to
state ZV
while on 1 we move to state
XW
ZV
on 0 we stay
in state ZV
while on 1 we move to
state ZW
XW
on 0 we move to
state YV
while on 1 we move to state
XT
ZW
on 0 we move to
state ZV
while on 1 we move to state
ZT
ZT
on 0 we move to
state ZV
while on 1 we stay in
state ZT
L1
and reject for
L2
, i.e. XT
and YV
. (XV
and YT
would have been valid, but they were two of the
unreachable states.)
Where State Symbol Next State
XT 0 YV
XT 1 XT
YV 0 ZV
YV 1 XW
XW 0 YV
XW 1 XT
ZV 0 ZV
ZV 1 ZW
ZW 0 ZV
ZW 1 ZT
ZT 0 ZV
ZT 1 ZT XT
is the start state and the
accept states are XT
and YV
Note that for
L1 U L2
and L1 @ L2
the transition
function/state tables are the same as this, but the set of accept states
differs!
L1 - L2
:
we can observe that, once state
ZV
is reached the only possible result
is rejection, so we can collapse states
ZV, ZW, ZT
into a single reject state,
R
, from which there is no escape...
Where State Symbol Next State
XT 0 YV
XT 1 XT
YV 0 R
YV 1 XW
XW 0 YV
XW 1 XT
R 0 R
R 1 R XT
is the start state and the
accept states are XT
and YV
For any string
x
over the alphabet, and
any pair of states p, q
from Q
,
§*((p,q),x) =
(§1*(p,x), §2*(q,x))
x
, is accepted by M
iff
§*((q1,q2),x)
is in A
(§1*(p,x), §2*(q,x))
is in A
A
,
A
is defined as in case 1 (union) then this is
the same as saying
either
§1*(q1,x)
is in A1
or
§2*(q2,x)
is in A2
A
is defined as in case 2 (intersection) then this is
the same as saying
that both
§1*(q1,x)
is in A1
and
§2*(q2,x)
is in A2
A
is defined as in case 3 (difference) then this is
the same as saying
that both
§1*(q1,x)
is in A1
and
§2*(q2,x)
is not in A2
(1+110)*0
(111+100)*0
1(01+10)*+0(11+10)*
z
be a fixed string of length n
over
the alphabet { 0, 1 }
.
{ 0, 1 }*z
?
Prove your answer.
L1
State | Symbol | Next State |
T | 0 | V |
T | 1 | T |
V | 0 | V |
V | 1 | W |
W | 0 | V |
W | 1 | T |
T
is the start state and
W
is the accept state.
L2
State | Symbol | Next State |
R | 0 | S |
R | 1 | S |
S | 0 | R |
S | 1 | R |
R
is the start state and
S
is the accept state.
Draw the finite automata corresponding to L2 - L1