Random Group Automata
Cyril Nicaud
LIAFA, Université Paris 7
Algorithms Seminar
February 21, 2000
[summary by Marianne Durand]
A properly typeset version of this document is available in
postscript and in
pdf.
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
Abstract
A group automaton is a complete deterministic automaton such that each
letter of the alphabet acts on the set of states as a
permutation [1, 5]. The aim is to describe an
algorithm for the random generation of a minimal group automaton with
n states. The treatment is largely based on properties of random
permutations and random automata.
1 Properties
A group automaton is a complete deterministic automaton such that each
letter of the alphabet acts on the set of states as a
permutation [1, 5]. We consider a group automaton
A, with states 1, 2, ..., n. The state 1 is the
initial state; the set of final states is denoted by F, the alphabet
by a, b, ..., and the transitions by q2=d(q1,a) or
equivalently (q1,a,q2).
Figure 1: A group automaton.
Let us recall that two states q1 and q2 of an automaton are
equivalent, notationally q1 ~ q2, if for every word u, the
state d(q1,u) belongs to F if and only if d(q2,u)
belongs to F. The automaton A is minimal if A
has no distinct equivalent states. The structure properties of group
automata are: the minimal automaton of a group automaton is a group
automaton; the set of group automata is closed under union,
intersection and complementation but it is not closed under star and
product. As each letter acts like a permutation on the set of states,
there cannot exist two transitions (q1,a,q) and (q2,a,q) with
q1 and q2 distinct. This means that there is a ``reversibility''
property because when the automaton is in a state q after reading a
word u, it is possible to retrace the path followed.
We are now interested in the connexity of an automaton. An automaton
is connected if for any state q, there is a path joining the initial
state to q. Because of the reversibility property, if a group
automaton is connected then it is strongly connected, which means that
for any states q and q', there is a path from q to q'. A
group automaton is defined by the k permutations coding the
transitions and by the set F, where k is the cardinality of the
alphabet, so there are (2n-1)n!k group automata. We show that, if
the alphabet has at least two letters, almost all group automata on
n states are connected. In order to do this we first state the fact
that given two permutations s and a the generated group
ás,añ is almost surely transitive. This can be
shown by a simple combinatorial argument. Take two letters a
and b and consider sa the permutation related to a and
sb the one related to b; then as
ása,sbñ is almost always transitive the
automaton is almost always connected. We even have an asymptotic
estimate if the alphabet has exactly two letters:
Card(not connected group automata) |
|
Card(group automata) |
|
~ |
|
. |
2 Minimality
We now have to study the minimality of the automaton. An important
theorem is that almost all connected group automata are minimal. The
proof is partially based on the study of the one-letter case: if the
automaton is connected, then as there is only one letter a, the
permutation induced by a is a circular permutation. It is minimal
if it is not stable under a rotation which is equivalent to saying
that the word u=1···dk(1,a)···dn-1(1,a) is
not a non-trivial factor of uu. Then in this case by counting the
words corresponding to minimal circular permutations we show that
almost all connected automata are minimal on a one-letter alphabet.
If the alphabet has more than one letter, we observe that for almost
all group automata, there is a letter a such that the permutation
induced by a on the set of states has only one cycle of maximum
length [3]. More precisely, we have the following lemma:
Lemma 1
The probability that a permutation s of size n has more than
two cycles of maximum length is o(1).
Proof.
Let cn,m be the probability that a permutation of size n has
exactly two maximal cycles of size m+1. We note the generating
function Cm(z)=ån=0¥cn,mzn and cn=åm£
n/2cn,m. The following equality holds:
Cm(z)= |
|
ez··· e |
|
=
|
|
|
exp |
( |
-rm(z) |
) |
where rm(z)=ån>mzn/n is the remainder of the
generating function of the logarithm. In order to get the coefficient
cn,m we apply Cauchy's formula:
where C is a path around the origin. We choose for this
path a circle around the origin defined by: |z|=e-1/n and
we set z=e-p/n for a change of variable. So we have
We now need to approximate some of the quantities in the integral, for
this we use a technique and a few lemmas provided in [2].
We first have the relations
rm(e-p)=E(mp)+O |
æ ç ç è |
|
ö ÷ ÷ ø |
and
|
|
= |
|
+ |
|
y |
æ ç ç è |
|
ö ÷ ÷ ø |
(1) |
with E(x)=òx¥e-v/v dv and
y(z)=1/1-e-z-1/z, and where the error
term O(exp(-mp)/M) is moreover uniform over Â(p)>0 and
|Á(p)|£p.
Property 1
For all a>0, the function e-aE(u) is bounded on Â(u)>0.
The relations 1 allow us to write, after we
set µ=m/n:
cn,m |
= |
|
ó õ |
|
exp |
æ ç ç è |
-E(µ
p)+O |
æ ç ç è |
|
ö ÷ ÷ ø |
ö ÷ ÷ ø |
æ ç ç è |
|
+ |
|
y |
æ ç ç è |
|
ö ÷ ÷ ø |
ö ÷ ÷ ø |
|
dp |
|
|
= |
|
ó õ |
|
exp |
( |
-E(µ p) |
) |
|
æ ç ç è |
|
+ |
|
y |
æ ç ç è |
|
ö ÷ ÷ ø |
+O |
æ ç ç è |
|
ö ÷ ÷ ø |
ö ÷ ÷ ø |
|
dp. |
|
This rewrites as cn,m=I1+I2+I3 where
I1 |
= |
|
I2 |
= |
|
ó õ |
|
exp |
( |
-E(µ p) |
) |
|
y |
æ ç ç è |
|
ö ÷ ÷ ø |
|
dp, |
|
I3 |
= |
|
|
ó õ |
|
exp |
( |
-E(µ p) |
) |
O |
æ ç ç è |
|
ö ÷ ÷ ø |
|
dp. |
|
To study these three expressions, we use the fact that the quantities
exp(-E(µ p)) (Property 1) and ep
e-p(2m+2)/n are bounded uniformly on m . This helps us
to give an upper bound for these three expressions: first,
and this approximation is uniform on m. Second
I2= |
ó õ |
|
O(1) |
|
y |
æ ç ç è |
|
ö ÷ ÷ ø |
|
dp |
as y is also bounded uniformly on m we have
I2= |
|
|
ó õ |
|
O(1) dp=O |
æ ç ç è |
|
ö ÷ ÷ ø |
. |
Third, as in the case of I1, we obtain
I3= |
|
ó õ |
|
O |
æ ç ç è |
|
ö ÷ ÷ ø |
|
dp=O |
æ ç ç è |
|
ö ÷ ÷ ø |
. |
Combining these estimates we obtain
cn,m=O(log n/m2) uniformly on m. The
approximation is going to be useful when m is greater than (n)1/2; otherwise we use the following lemma: Lemma 2 The
probability that a permutation s of size n has a maximal
cycle of length smaller than (n)1/2 is o(1).
Proof.
Let pn,m be the probability that a permutation of size n has all its cycles
of size smaller than m. The saddle-point method gives us an upper bound for
the quantity pn,m. Then we have
pn,m=[zn]e |
|
£ |
|
where
lm(z)=z+...+ |
|
. |
The saddle-point method drives
us to apply this inequality to the value r=n1/3m chosen
to fit the minimum, which gives
The probability that a permutation has two maximal cycles of size m
is bounded by the probability that a permutation has one maximal cycle
of size m. Therefore the probability that a permutation of size n
has two maximal cycles of size smaller than (n)1/2 is o(1). So
cn=o(1)+åm=(n)1/2m=n/2cn,m=o(1) by the approximation
cn,m=O(log n/m2). Lemma 1 directly follows
by showing that almost all permutations of size n having at least
two maximal cycles have exactly two maximal cycles.
We define En as the set of group automata A of
size n that are connected and with the property that there exists
one letter a such that the permutation induced by a has only one
maximal cycle. By Lemma 1, we show that almost all connected group
automata belong to En. Furthermore, if A
belongs to En then we can show that the maximal cycle
of sa does not interfere with other cycles, because of their
different cardinalities and so we can use the one-letter case, and say
that this maximal cycle is almost always minimal. As the automaton
considered is connected, this implies that the automaton is minimal.
So we have the following result:
Theorem 1
Almost all group automata are minimal.
Proof.
EnÌMinimalnÌConnectednÌGroup
Automatonn, and we have proved that almost every group automaton is
in En.
3 Algorithm
This work naturally leads to an algorithm for generating uniformly at
random a minimal connected group automata. Here the cardinality of the
alphabet is bounded. The size of an automaton is the number n of
states of its minimal automaton. The algorithm is:
-
generate a random group automaton A using a function
returning a random permutation for each letter of the alphabet. The
cost is O(n);
-
test if AÎ En, if not use Hopcroft's algorithm
to check if it is minimal. Since Hopcroft is used rarely, the cost
is O(n);
this being done a constant number of time on average, because of the
theorem above.
This yields a linear complexity in the average case, which is better
than the best known algorithm by Hopcroft [4] which has
complexity nlog n.
References
- [1]
-
Eilenberg (Samuel). --
Automata, languages, and machines. Vol. A. --
Academic Press, New York, 1974, xvi+451p. Pure and
Applied Mathematics, Vol. 58.
- [2]
-
Gourdon (Xavier). --
Combinatoire, algorithmique et géométrie des
polynômes. --
Thèse, École polytechnique, 1996.
- [3]
-
Gourdon (Xavier). --
Largest component in random combinatorial structures. In Proceedings of the 7th Conference on Formal Power Series and Algebraic
Combinatorics (Noisy-le-Grand, 1995), vol. 180, pp. 185--209. --
1998.
- [4]
-
Hopcroft (John). --
An n log n algorithm for minimizing states in a finite automaton.
In Theory of machines and computations (Proc. Internat. Sympos.,
Technion, Haifa, 1971). pp. 189--196. --
Academic Press, New York, 1971.
- [5]
-
Hopcroft (John E.) and Ullman (Jeffrey D.). --
Introduction to automata theory, languages, and computation. --
Addison-Wesley Publishing Co., Reading, Mass., 1979,
x+418p. Addison-Wesley Series in Computer Science.
This document was translated from LATEX by
HEVEA.