During the semester we will examine several description approaches of varying compexity - the first of these approaches is called regular expressions, and the languages it allows us to describe are called regular languages.
The easiest way of describing the set of all regular languages is by describing the ways you can "build up" a regular language.
Definition: for any alphabet, ∑
,
the set of regular languages over ∑
is as follows:
The regular expression describing that language is ø
{ ñ }
The regular expression describing that language is
ñ
∑
,
the language which consists of one string which is exactly that character
in a regular language, i.e.
∀a ∈ ∑
,
{ a }
is a regular language
The regular expression describing that language is a
L1
and L2
are regular languages, and r1
and r2
are their corresponding regular expressions, then
L1 U L2
is a
regular language, and its corresponding regular expression is
(r1 + r2)
L1L2
is a
regular language, and its corresponding regular expression is
(r1r2)
L1*
is a regular language
and its corresponding regular expression is (r1*)
Only those languages that can be obtained using statements 1-4 are regular
languages over ∑
Handy shortcuts and extra notation
Even though the above gives the formal definition of regular languages, there are two more notation items that make descriptions easier to read:
L
is a regular language and r
is
its corresponding regular expression, then L+
indicates the regular language formed by concatenating one or more strings
from L
, and the regular expression for this is
r+
L
is a regular language and r
is
its corresponding regular expression, then Ln
indicates the regular language formed by concatenating exactly n
strings
from L
, and the regular expression for this is
rn
. (This holds for any integer n
.)
For example, if our alphabet is { a, b, c }
, then
the set of all strings of length two is finite, and could be listed
with the regular expression
(aa + ab + ac + ba + bb + bc + ca + cb + cc)
(Of course, a more compact representation would be
something like (a + b + c)2
)
Yes, since we can obtain it with the regular expression
(∑2)*
Yes, since we can obtain any string in the language by adding a single character to some
string of even length, and we know the even length strings form a regular language, i.e.:
(∑2)*∑
{ 0, 1 }
, is the set of all strings
containing the substring 1001
a regular language?
Yes, since we can create the language with the following regular expression:
(0 + 1)* 1001 (0 + 1)*
{ 0, 1 }
, is the set of strings of
length at most 100 a regular language?
Yes, since we could represent it by enumerating all the strings of length
at most 100, although a much more compact notation would be
(0 + 1 + ñ)100
{ 0, 1 }
, is the set of strings
representing powers of two (expressed as binary integers) a regular language?
Yes, since we could represent the language with the regular expression
10*
{ a, b }
, is the set of strings
which contain no consecutive a
's a regular language?
Yes, since we could represent the language with the regular expression
b*(ab+)(a+ñ)
b
's as we like at the beginning,
but each a
must be followed by at least one b
,
except the very last a
if it appears at the end of the string.)
{ a, b }
, is the set of strings
which contain at least as many a
's as b
's
a regular language?
In fact, it is not - a proof of this will be considered in several lectures, but there is no regular expression that can be used to describe this language.
(0*+1*)(0*+1*)(0*+1*)
0*(100*)*1*
01((01)*01+(01)*)+(01)*
0*1(0*10*1)*0
(0+1)*
(0+1+0+ +
1+0+1+)
(0+1)*
(0*+1*)(0*+1*)(0*+1*)
0*(100*)*1*
01((01)*01+(01)*)+(01)*
(01)*
0*1(0*10*1)*0
(0+1)*
(0+1+0+ +
1+0+1+)
(0+1)*
(0 + 1)*(00 + 10 + 11)
1*(011+)*
(0+1)*11(0+1)*010(0+1)*
+ (0+1)*010(0+1)*11(0+1)*