Random Group Automata

Random Group Automata

Cyril Nicaud

LIAFA, Université Paris 7

Algorithms Seminar

February 21, 2000

[summary by Marianne Durand]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).

Abstract

A group automaton is a complete deterministic automaton such that each letter of the alphabet acts on the set of states as a permutation [1, 5]. The aim is to describe an algorithm for the random generation of a minimal group automaton with n states. The treatment is largely based on properties of random permutations and random automata.

1 Properties

A group automaton is a complete deterministic automaton such that each letter of the alphabet acts on the set of states as a permutation [1, 5]. We consider a group automaton A, with states 1, 2, ..., n. The state 1 is the initial state; the set of final states is denoted by F, the alphabet by a, b, ..., and the transitions by q₂=d(q₁,a) or equivalently (q₁,a,q₂).

Figure 1: A group automaton.

Let us recall that two states q₁ and q₂ of an automaton are equivalent, notationally q₁ ~ q₂, if for every word u, the state d(q₁,u) belongs to F if and only if d(q₂,u) belongs to F. The automaton A is minimal if A has no distinct equivalent states. The structure properties of group automata are: the minimal automaton of a group automaton is a group automaton; the set of group automata is closed under union, intersection and complementation but it is not closed under star and product. As each letter acts like a permutation on the set of states, there cannot exist two transitions (q₁,a,q) and (q₂,a,q) with q₁ and q₂ distinct. This means that there is a ``reversibility'' property because when the automaton is in a state q after reading a word u, it is possible to retrace the path followed.

We are now interested in the connexity of an automaton. An automaton is connected if for any state q, there is a path joining the initial state to q. Because of the reversibility property, if a group automaton is connected then it is strongly connected, which means that for any states q and q', there is a path from q to q'. A group automaton is defined by the k permutations coding the transitions and by the set F, where k is the cardinality of the alphabet, so there are (2ⁿ-1)n!^k group automata. We show that, if the alphabet has at least two letters, almost all group automata on n states are connected. In order to do this we first state the fact that given two permutations s and a the generated group ás,añ is almost surely transitive. This can be shown by a simple combinatorial argument. Take two letters a and b and consider s_a the permutation related to a and s_b the one related to b; then as ás_a,s_bñ is almost always transitive the automaton is almost always connected. We even have an asymptotic estimate if the alphabet has exactly two letters:

Card(not connected group automata)

Card(group automata)

2 Minimality

We now have to study the minimality of the automaton. An important theorem is that almost all connected group automata are minimal. The proof is partially based on the study of the one-letter case: if the automaton is connected, then as there is only one letter a, the permutation induced by a is a circular permutation. It is minimal if it is not stable under a rotation which is equivalent to saying that the word u=1···d^k(1,a)···d^n-1(1,a) is not a non-trivial factor of uu. Then in this case by counting the words corresponding to minimal circular permutations we show that almost all connected automata are minimal on a one-letter alphabet. If the alphabet has more than one letter, we observe that for almost all group automata, there is a letter a such that the permutation induced by a on the set of states has only one cycle of maximum length [3]. More precisely, we have the following lemma:

Lemma 1 The probability that a permutation s of size n has more than two cycles of maximum length is o(1).

Proof. Let c_n,m be the probability that a permutation of size n has exactly two maximal cycles of size m+1. We note the generating function C_m(z)=å_n=0^¥c_n,mzⁿ and c_n=å_m£
n/2c_n,m. The following equality holds:

C_m(z)=

z^2(m+1)

2(m+1)²

e^z··· e

z^m

1-z

z^2(m+1)

2(m+1)²

exp

(

-r_m(z)

)

where r_m(z)=å_n>mzⁿ/n is the remainder of the generating function of the logarithm. In order to get the coefficient c_n,m we apply Cauchy's formula:

c_n,m=

2ip

ó
õ

1-z

z^2(m+1)

2(m+1)²

exp

(

-r_m(z)

)

zⁿ⁺¹

where C is a path around the origin. We choose for this path a circle around the origin defined by: |z|=e^-1/n and we set z=e^-p/n for a change of variable. So we have

c_n,m=

2ip

ó
õ

1+inp

1-inp

exp

(

-r_m(e^-p/n)

)

1-e^-p/n

e^-p(2m+2)/n

2(m+1)²

e^p

We now need to approximate some of the quantities in the integral, for this we use a technique and a few lemmas provided in [2]. We first have the relations

r_m(e^-p)=E(mp)+O

æ
ç
ç
è

e^-mp

ö
÷
÷
ø

and

(

1-e^-p/n

)

æ
ç
ç
è

ö
÷
÷
ø

(1)

with E(x)=ò_x^¥e^-v/v dv and y(z)=1/1-e^-z-1/z, and where the error term O(exp(-mp)/M) is moreover uniform over Â(p)>0 and |Á(p)|£p.

Property 1 For all a>0, the function e^-aE(u) is bounded on Â(u)>0.

The relations 1 allow us to write, after we set µ=m/n:

c_n,m

2ip

ó
õ

1+inp

1-inp

exp

æ
ç
ç
è

-E(µ p)+O

æ
ç
ç
è

ö
÷
÷
ø

æ
ç
ç
è

ö
÷
÷
ø

e^p e^-p(2m+2)/n

2(m+1)²

2ip

ó
õ

1+inp

1-inp

exp

(

-E(µ p)

)

æ
ç
ç
è

ö
÷
÷
ø

æ
ç
ç
è

ö
÷
÷
ø

e^p e^-p(2m+2)/n

2(m+1)²

dp.

This rewrites as c_n,m=I₁+I₂+I₃ where

I₁

2ip

ó
õ

1+inp

1-inp

exp

(

-E(µ p)

)

e^p e^-p(2m+2)/n

2(m+1)²

dp,

I₂

2ip

ó
õ

1+inp

1-inp

exp

(

-E(µ p)

)

æ
ç
ç
è

ö
÷
÷
ø

e^p e^-p(2m+2)/n

2(m+1)²

dp,

I₃

2ip

ó
õ

1+inp

1-inp

exp

(

-E(µ p)

)

æ
ç
ç
è

ö
÷
÷
ø

e^p e^-p(2m+2)/n

2(m+1)²

dp.

To study these three expressions, we use the fact that the quantities exp(-E(µ p)) (Property 1) and e^p e^-p(2m+2)/n are bounded uniformly on m . This helps us to give an upper bound for these three expressions: first,

I₁=

ó
õ

1+inp

1-inp

O(1)

pm²

dp=O

æ
ç
ç
è

log n

m²

ö
÷
÷
ø

and this approximation is uniform on m. Second

I₂=

ó
õ

1+inp

1-inp

O(1)

æ
ç
ç
è

ö
÷
÷
ø

2(m+1)²

as y is also bounded uniformly on m we have

I₂=

nm²

ó
õ

1+inp

1-inp

O(1) dp=O

æ
ç
ç
è

m²

ö
÷
÷
ø

Third, as in the case of I₁, we obtain

I₃=

ó
õ

1+inp

1-inp

æ
ç
ç
è

ö
÷
÷
ø

2(m+1)²

dp=O

æ
ç
ç
è

log n

m³

ö
÷
÷
ø

Combining these estimates we obtain c_n,m=O(log n/m²) uniformly on m. The approximation is going to be useful when m is greater than (n)^1/2; otherwise we use the following lemma:

Lemma 2 The probability that a permutation s of size n has a maximal cycle of length smaller than (n)^1/2 is o(1).

Proof. Let p_n,m be the probability that a permutation of size n has all its cycles of size smaller than m. The saddle-point method gives us an upper bound for the quantity p_n,m. Then we have

p_n,m=[zⁿ]e

l_m(z)

l_m(r)

rⁿ

where l_m(z)=z+...+

z^m

The saddle-point method drives us to apply this inequality to the value r=n^1/3m chosen to fit the minimum, which gives

p_n,m£

exp

(

n^1/3logm

)

n^n/3m

, so

n,(n)^1/2

£ e

æ
ç
ç
è

n^1/3

(n)^1/2

ö
÷
÷
ø

log n

=o(1).

The probability that a permutation has two maximal cycles of size m is bounded by the probability that a permutation has one maximal cycle of size m. Therefore the probability that a permutation of size n has two maximal cycles of size smaller than (n)^1/2 is o(1). So c_n=o(1)+å_m=(n)^_1/2^m=n/2c_n,m=o(1) by the approximation c_n,m=O(log n/m²). Lemma 1 directly follows by showing that almost all permutations of size n having at least two maximal cycles have exactly two maximal cycles.

We define E_n as the set of group automata A of size n that are connected and with the property that there exists one letter a such that the permutation induced by a has only one maximal cycle. By Lemma 1, we show that almost all connected group automata belong to E_n. Furthermore, if A belongs to E_n then we can show that the maximal cycle of s_a does not interfere with other cycles, because of their different cardinalities and so we can use the one-letter case, and say that this maximal cycle is almost always minimal. As the automaton considered is connected, this implies that the automaton is minimal. So we have the following result:

Theorem 1 Almost all group automata are minimal.

Proof. E_nÌMinimal_nÌConnected_nÌGroup Automaton_n, and we have proved that almost every group automaton is in E_n.

3 Algorithm

This work naturally leads to an algorithm for generating uniformly at random a minimal connected group automata. Here the cardinality of the alphabet is bounded. The size of an automaton is the number n of states of its minimal automaton. The algorithm is:

generate a random group automaton A using a function returning a random permutation for each letter of the alphabet. The cost is O(n);
test if AÎ E_n, if not use Hopcroft's algorithm to check if it is minimal. Since Hopcroft is used rarely, the cost is O(n);

this being done a constant number of time on average, because of the theorem above.

This yields a linear complexity in the average case, which is better than the best known algorithm by Hopcroft [4] which has complexity nlog n.

References

[1]: Eilenberg (Samuel). -- Automata, languages, and machines. Vol. A. -- Academic Press, New York, 1974, xvi+451p. Pure and Applied Mathematics, Vol. 58.
[2]: Gourdon (Xavier). -- Combinatoire, algorithmique et géométrie des polynômes. -- Thèse, École polytechnique, 1996.
[3]: Gourdon (Xavier). -- Largest component in random combinatorial structures. In Proceedings of the 7th Conference on Formal Power Series and Algebraic Combinatorics (Noisy-le-Grand, 1995), vol. 180, pp. 185--209. -- 1998.
[4]: Hopcroft (John). -- An n log n algorithm for minimizing states in a finite automaton. In Theory of machines and computations (Proc. Internat. Sympos., Technion, Haifa, 1971). pp. 189--196. -- Academic Press, New York, 1971.
[5]: Hopcroft (John E.) and Ullman (Jeffrey D.). -- Introduction to automata theory, languages, and computation. -- Addison-Wesley Publishing Co., Reading, Mass., 1979, x+418p. Addison-Wesley Series in Computer Science.

This document was translated from L^AT_EX by H^EV^EA.