Random Group Automata

Cyril Nicaud

LIAFA, Université Paris 7

Algorithms Seminar

February 21, 2000

[summary by Marianne Durand]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).

Abstract
A group automaton is a complete deterministic automaton such that each letter of the alphabet acts on the set of states as a permutation [1, 5]. The aim is to describe an algorithm for the random generation of a minimal group automaton with n states. The treatment is largely based on properties of random permutations and random automata.

## 1   Properties

A group automaton is a complete deterministic automaton such that each letter of the alphabet acts on the set of states as a permutation [1, 5]. We consider a group automaton A, with states 1, 2, ..., n. The state 1 is the initial state; the set of final states is denoted by F, the alphabet by ab, ..., and the transitions by q2=d(q1,a) or equivalently (q1,a,q2).

Figure 1: A group automaton.

Let us recall that two states q1 and q2 of an automaton are equivalent, notationally q1 ~ q2, if for every word u, the state d(q1,u) belongs to F if and only if d(q2,u) belongs to F. The automaton A is minimal if A has no distinct equivalent states. The structure properties of group automata are: the minimal automaton of a group automaton is a group automaton; the set of group automata is closed under union, intersection and complementation but it is not closed under star and product. As each letter acts like a permutation on the set of states, there cannot exist two transitions (q1,a,q) and (q2,a,q) with q1 and q2 distinct. This means that there is a ``reversibility'' property because when the automaton is in a state q after reading a word u, it is possible to retrace the path followed.

We are now interested in the connexity of an automaton. An automaton is connected if for any state q, there is a path joining the initial state to q. Because of the reversibility property, if a group automaton is connected then it is strongly connected, which means that for any states q and q', there is a path from q to q'. A group automaton is defined by the k permutations coding the transitions and by the set F, where k is the cardinality of the alphabet, so there are (2n-1)n!k group automata. We show that, if the alphabet has at least two letters, almost all group automata on n states are connected. In order to do this we first state the fact that given two permutations s and a the generated group ás,añ is almost surely transitive. This can be shown by a simple combinatorial argument. Take two letters a and b and consider sa the permutation related to a and sb the one related to b; then as ása,sbñ is almost always transitive the automaton is almost always connected. We even have an asymptotic estimate if the alphabet has exactly two letters:
Card(not connected group automata)
Card(group automata)
~
1
n
.

## 2   Minimality

We now have to study the minimality of the automaton. An important theorem is that almost all connected group automata are minimal. The proof is partially based on the study of the one-letter case: if the automaton is connected, then as there is only one letter a, the permutation induced by a is a circular permutation. It is minimal if it is not stable under a rotation which is equivalent to saying that the word u=1···dk(1,a)···dn-1(1,a) is not a non-trivial factor of uu. Then in this case by counting the words corresponding to minimal circular permutations we show that almost all connected automata are minimal on a one-letter alphabet. If the alphabet has more than one letter, we observe that for almost all group automata, there is a letter a such that the permutation induced by a on the set of states has only one cycle of maximum length [3]. More precisely, we have the following lemma:
Lemma 1   The probability that a permutation s of size n has more than two cycles of maximum length is o(1).

Proof. Let cn,m be the probability that a permutation of size n has exactly two maximal cycles of size m+1. We note the generating function Cm(z)=ån=0¥cn,mzn and cn=åm£ n/2cn,m. The following equality holds:
Cm(z)=
z2(m+1)
2(m+1)2
ez··· e
zm
m

=
1
1-z
z2(m+1)
2(m+1)2
exp ( -rm(z) )
where rm(z)=ån>mzn/n is the remainder of the generating function of the logarithm. In order to get the coefficient cn,m we apply Cauchy's formula:
cn,m=
1
2ip
ó
õ
 C
1
1-z
z2(m+1)
2(m+1)2
exp ( -rm(z) )
dz
zn+1
where C is a path around the origin. We choose for this path a circle around the origin defined by: |z|=e-1/n and we set z=e-p/n for a change of variable. So we have
cn,m=
1
2ip
ó
õ
 1+inp 1-inp
 exp ( -rm(e-p/n) )
1-e-p/n
e-p(2m+2)/n
2(m+1)2
ep
n
dp
We now need to approximate some of the quantities in the integral, for this we use a technique and a few lemmas provided in [2]. We first have the relations
rm(e-p)=E(mp)+O æ
ç
ç
è
e-mp
m
ö
÷
÷
ø
and
1
 n ( 1-e-p/n )
=
1
p
+
1
n
y æ
ç
ç
è
p
n
ö
÷
÷
ø
(1)
with E(x)=òx¥e-v/v dv and y(z)=1/1-e-z-1/z, and where the error term O(exp(-mp)/M) is moreover uniform over Â(p)>0 and |Á(p)|£p.
Property 1   For all a>0, the function e-aE(u) is bounded on Â(u)>0.
The relations 1 allow us to write, after we set µ=m/n:
cn,m =
1
2ip
ó
õ
 1+inp 1-inp
exp æ
ç
ç
è
-Ep)+O æ
ç
ç
è
1
m
ö
÷
÷
ø
ö
÷
÷
ø
æ
ç
ç
è
1
p
+
1
n
y æ
ç
ç
è
p
n
ö
÷
÷
ø
ö
÷
÷
ø
ep e-p(2m+2)/n
2(m+1)2
dp
=
1
2ip
ó
õ
 1+inp 1-inp
exp ( -Ep) ) æ
ç
ç
è
1
p
+
1
n
y æ
ç
ç
è
p
n
ö
÷
÷
ø
+O æ
ç
ç
è
1
pm
ö
÷
÷
ø
ö
÷
÷
ø
ep e-p(2m+2)/n
2(m+1)2
dp.
This rewrites as cn,m=I1+I2+I3 where
I1 =
1
2ip
ó
õ
 1+inp 1-inp
exp ( -Ep) )
1
p
ep e-p(2m+2)/n
2(m+1)2
dp,
I2 =
1
2ip
ó
õ
 1+inp 1-inp
exp ( -Ep) )
1
n
y æ
ç
ç
è
p
n
ö
÷
÷
ø
ep e-p(2m+2)/n
2(m+1)2
dp,
I3 =
1
m
1
2ip
ó
õ
 1+inp 1-inp
exp ( -Ep) ) O æ
ç
ç
è
1
p
ö
÷
÷
ø
ep e-p(2m+2)/n
2(m+1)2
dp.
To study these three expressions, we use the fact that the quantities exp(-Ep)) (Property 1) and ep e-p(2m+2)/n are bounded uniformly on m . This helps us to give an upper bound for these three expressions: first,
I1= ó
õ
 1+inp 1-inp
O(1)
pm2
dp=O æ
ç
ç
è
log n
m2
ö
÷
÷
ø
and this approximation is uniform on m. Second
I2= ó
õ
 1+inp 1-inp
O(1)
1
n
y æ
ç
ç
è
p
n
ö
÷
÷
ø
1
2(m+1)2
dp
as y is also bounded uniformly on m we have
I2=
1
nm2
ó
õ
 1+inp 1-inp
O(1) dp=O æ
ç
ç
è
1
m2
ö
÷
÷
ø
.
Third, as in the case of I1, we obtain
I3=
1
m
ó
õ
 1+inp 1-inp
O æ
ç
ç
è
1
p
ö
÷
÷
ø
1
2(m+1)2
dp=O æ
ç
ç
è
log n
m3
ö
÷
÷
ø
.
Combining these estimates we obtain cn,m=O(log n/m2) uniformly on m. The approximation is going to be useful when m is greater than (n)1/2; otherwise we use the following lemma:
Lemma 2   The probability that a permutation s of size n has a maximal cycle of length smaller than (n)1/2 is o(1).

Proof. Let pn,m be the probability that a permutation of size n has all its cycles of size smaller than m. The saddle-point method gives us an upper bound for the quantity pn,m. Then we have
pn,m=[zn]e
 lm(z)
£
e
 lm(r)
rn
where   lm(z)=z+...+
zm
m
.
The saddle-point method drives us to apply this inequality to the value r=n1/3m chosen to fit the minimum, which gives
pn,m£
 exp ( n1/3logm )
nn/3m
,   so    p
 n,(n)1/2
£ e
æ
ç
ç
è
n1/3
2
-
(n)1/2
3
ö
÷
÷
ø
log n

=o(1).

The probability that a permutation has two maximal cycles of size m is bounded by the probability that a permutation has one maximal cycle of size m. Therefore the probability that a permutation of size n has two maximal cycles of size smaller than (n)1/2 is o(1). So cn=o(1)+åm=(n)1/2m=n/2cn,m=o(1) by the approximation cn,m=O(log n/m2). Lemma 1 directly follows by showing that almost all permutations of size n having at least two maximal cycles have exactly two maximal cycles.

We define En as the set of group automata A of size n that are connected and with the property that there exists one letter a such that the permutation induced by a has only one maximal cycle. By Lemma 1, we show that almost all connected group automata belong to  En. Furthermore, if A belongs to  En then we can show that the maximal cycle of sa does not interfere with other cycles, because of their different cardinalities and so we can use the one-letter case, and say that this maximal cycle is almost always minimal. As the automaton considered is connected, this implies that the automaton is minimal. So we have the following result:

Theorem 1   Almost all group automata are minimal.

Proof. EnÌMinimalnÌConnectednÌGroup Automatonn, and we have proved that almost every group automaton is in En.

## 3   Algorithm

This work naturally leads to an algorithm for generating uniformly at random a minimal connected group automata. Here the cardinality of the alphabet is bounded. The size of an automaton is the number n of states of its minimal automaton. The algorithm is:
• generate a random group automaton  A using a function returning a random permutation for each letter of the alphabet. The cost is O(n);
• test if AÎ En, if not use Hopcroft's algorithm to check if it is minimal. Since Hopcroft is used rarely, the cost is O(n);
this being done a constant number of time on average, because of the theorem above.

This yields a linear complexity in the average case, which is better than the best known algorithm by Hopcroft [4] which has complexity nlog n.

## References

[1]
Eilenberg (Samuel). -- Automata, languages, and machines. Vol. A. -- Academic Press, New York, 1974, xvi+451p. Pure and Applied Mathematics, Vol. 58.

[2]
Gourdon (Xavier). -- Combinatoire, algorithmique et géométrie des polynômes. -- Thèse, École polytechnique, 1996.

[3]
Gourdon (Xavier). -- Largest component in random combinatorial structures. In Proceedings of the 7th Conference on Formal Power Series and Algebraic Combinatorics (Noisy-le-Grand, 1995), vol. 180, pp. 185--209. -- 1998.

[4]
Hopcroft (John). -- An n log n algorithm for minimizing states in a finite automaton. In Theory of machines and computations (Proc. Internat. Sympos., Technion, Haifa, 1971). pp. 189--196. -- Academic Press, New York, 1971.

[5]
Hopcroft (John E.) and Ullman (Jeffrey D.). -- Introduction to automata theory, languages, and computation. -- Addison-Wesley Publishing Co., Reading, Mass., 1979, x+418p. Addison-Wesley Series in Computer Science.

This document was translated from LATEX by HEVEA.