Patricia Tries in the Context of Dynamical Systems

Patricia Tries in the Context of Dynamical Systems

Jérémie Bourdon

Greyc, Université de Caen (France)

Algorithms Seminar

March 19, 2001

[summary by Michel Nguyen-The]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).

Abstract

Tries, a generalized form of digital trees, are a data structure widely used in numerous domains: algorithms for searching words, compression, dynamical hashing, ... Their interest and construction lie in the partitioning of a set of words. We present a compact form of tries, called Patricia tries, in which all unary nodes are suppressed (and thus do not intervene in the partitioning). We then study the means of the memory occupation and of the cost of inserting a word for that data structure when words are produced by a probabilistic source for which the dependencies between the emitted symbols can be very important.

1 Size and Path Length of Tries and Patricia Tries: Expressions for Expectations

We define the notions of tries and Patricia tries. We find general expressions for the expectations of the size and path length of tries and Patricia tries in the Bernoulli model, valid for any source.

1.1 Operations on infinite words

For a finite alphabet S={a₁,a₂,...,a_r}, let S^¥ be the set of infinite words on that alphabet, s : S^¥®S^¥ the map that returns the first letter of a word, and T : S^¥®S^¥ the shift that returns the first suffix of a word. Let T_[a] denote the restriction of T to the set s^-1({a}) of words beginning with symbol a and, for a finite prefix w=a₁... a_k, let T_[w] denote the composition T_{[a_k]}°T_{[a_k-1]}°...°T_[a₁]. The notations s and T are kept for operators acting on reals which will be used later.

1.2 Tries

Definition 1 Let X be a finite set of infinite words produced by the same source. A trie Tr(X) is a structure defined by the following rules:

(R₀) If X=Ø (the empty set), Tr(X) is the empty tree.
(R₁) If X={x}, Tr(X) consists of a single leaf node represented by a square that contains x.
(R₂) If X is of cardinality greater than or equal to 2, Tr(X) is an internal node represented by · to which are attached r subtrees:

Tr(X)= ·, Tr æ
è

T

_[a₁]X ö
ø , Tr æ
è

T

_[a₂]X ö
ø ,... Tr æ
è

T

_{[a_r]}X ö
ø .

The edge that attaches the subtrie Tr(T_{[a_j]}X) is labelled by the symbol a_j. Notice a little abuse in (R₂): if there is no word in X beginning with a_j, then T_{[a_j]}X is not defined, and we consider that is equal to the empty set. Hence Tr(T_{[a_j]}X) is the empty tree, and it is as though there were no subtree corresponding to a_j (see Figure 1).

1.3 Patricia Tries

A Patricia trie is a trie from which all unary nodes are eliminated.

Figure 1: Standard trie and corresponding Patricia trie.

Hence with any finite set X of infinite words produced by the same source, we associate a Patricia trie PaTr(X). The first two rules are the same, but the last rule (R'₂) is more sophisticated:

(R'₂) If X is of cardinality greater than or equal to 2, we have two cases:
- (R'_2,1) if s(X) consists of a single symbol, then PaTr(X) equals PaTr(TX).
- (R'_2,2) if s(X) has at least two distinct symbols, PaTr(X) is an internal node generically represented by · to which are attached r subtrees,
  
  PaTr(X)= ·, PaTr æ
  è
  
  T
  
  _[a₁]X ö
  ø , PaTr æ
  è
  
  T
  
  _[a₂]X ö
  ø ,··· PaTr æ
  è
  
  T
  
  _{[a_r]}X ö
  ø .

The edges of the Patricia trie are labelled by words. These words are obtained from the associated trie by concatenating all the labels of the collapsed edges.

1.4 Additive parameters

The depth of a node in a tree is the number of edges of the path that connects it to the root. The size of a tree is the number of its internal nodes. The path length of a tree is the sum of the depths of all (nonempty) external nodes.

1.5 Algebraic analysis of additive parameters

In a standard trie built on the set X={x₁, ..., x_n}, the structure of a node labelled by a prefix w is a finite string called a slice given by

_[w]X:=

æ
è

_[w](x₁),···,

_[w](x_n)

ö
ø

An additive parameter g on X is defined by a toll parameter d defined on finite strings and the recursive rule:

g[X]=

ì
ï
í
ï
î

if |X|£1,

(X)]+

mÎS

_[m]X],

if |X|³2,

Let |s| and #(s) denote the number of symbols of the string s and the number of distinct symbols of s, respectively. The parameters of interest are the size on tries and Patricia tries,

d_S(s)=

ì
í
î

1	if \|s\|³2,
0	otherwise,

d_PS(s)=

ì
í
î

1	if #(s)³2,
0	otherwise,

and the internal path length on tries and Patricia tries

d_L(s)=

ì
í
î

\|s\|	if \|s\|³2,
0	otherwise,

d_PL(s)=

ì
í
î

\|s\|	if #(s)³2,
0	otherwise.

1.6 Expectation of parameters

Let (P_z,S) denote the Poisson model of rate z relative to the source S, and p_w the probability that a given infinite word begins with the prefix w. If the cardinality of X is a random Poisson variable of rate z, the length of the slice sT_[w]X is also a random Poisson variable of rate zp_w. Hence the expectation of parameter g is a sum of expectations of parameter d, E[g;P_z,S]=å_wÎS^*E[d;P_{zp_w},B_w].

The expectation of the parameter is given by E[d;P_z,B]=e^-z.¶/¶ u F_d(z,u,p₁,···,p_r)|_u=1, where F_d(z,u,x₁,···,x_r)=å_sÎS^*z^|s|/|s|!u^d(s)x₁^|s|₁··· x_r^|s|_r.

Using algebraic depoissonization [3], based on the equalities E[Y; P_z]=e^-zå_n³0E[Y; B_n] zⁿ/n! and thus E[Y; B_n]=n![zⁿ]e^zE[Y; P_z] zⁿ/n!, one can return to the Bernoulli model. Finally, the expectations of interest are given in Table 1.

Size of Tr S^{^}(n)=å_wÎS^*( 1-(1+(n-1)p_w)(1-p_w)^n-1 )

Path Length of Tr L^{^}(n)=å_wÎS^*np_w( 1-(1-p_w)^n-1 )

Size of PaTr S_P^{^}(n)=å_wÎS^*( 1-(1-p_w)ⁿ-å_iÎS( (1-p_w(1-p_[i|w]))ⁿ-(1-p_w)ⁿ ) )

Path Length of PaTr L_P^{^}(n)=å_wÎS^*np_w( 1-(1-p_w)^n-1-å_iÎSp_[i|w]( 1-p_w(1-p_[i|w]))^n-1 )

Table 1: Expectations of size and path length for tries (Tr) and Patricia tries (PaTr).

2 Tools for the Asymptotics of the Expectations

2.1 Mellin analysis and Dirichlet series

To get asymptotics for the expressions found previously, we first note that they belong to the paradigm of harmonic sums. Their Mellin transforms are given in Table 2, where L(s)=å_wÎM^*p_w^s and

L_S(s)

= -

wÎS^*

p_w^s -

wÎS^*

p_w^s

iÎS

[

(1-p_i|w)^s-1

]

= (s-1)L(s)-s

k³2

(-1)^k

æ
ç
ç
è

k-1

i=2

(s-i)

ö
÷
÷
ø

[

(s-1)L^[k]

]

(1)

L_L(s)

wÎS^*

p_w^s

iÎS

[

(1-p_[i|w])^s-1-1

]

k³2

(-1)^k

(k-1)!

æ
ç
ç
è

k-1

i=2

(s-i)

ö
÷
÷
ø

[

(s-1)L^[k]

]

(2)

with L^[k](s)=å_wÎS^*p_w^så_iÎSp_i|w^k, for k³1,

Size of Tr S^*(s)=-L(-s)(s+1)G(s)

Path Length of Tr L^*(s)=-L(-s)G(s+1)

Size of PaTr S_P^*(s)=G(s)L_S(-s)

Path Length of PaTr L_P^*(s)=-G(s+1) (L(-s)+L_L(-s))

Table 2: Mellin transforms of expectations.

2.2 Dynamical sources

We have to restrict ourselves to a class of dynamical sources S (see [4] for more details and [2] for its use in a study of standard tries),

(a) a finite or denumerable alphabet S,
(b) a topological partition of I:=(0,1) with disjoint open intervals I_a, for aÎS,
(c) an encoding mapping s which is constant and equal to a on each I_a,
(d) a shift mapping T whose restriction to to I_a is a real analytic bijection from I_a to I.

Besides, T has to satisfy more precise properties. If we let h_a be the local inverse of T restricted to I_a and H be the set H={ h_a| aÎS }, then we add properties on bounds of the first derivatives, among which Rényi's condition which plays an important rôle in the study of conditional probabilities. This condition states that, if h_a are the local inverse of T, supposed to be locally holomorphic, restricted to I_a, then there exists a constant K that bounds the ratio |h_a''(x)/h_a'(x)| for all branch h_a and all xÎ[0,1]. With each h_a, that are only defined on I_a, we associate its analytical extension h_a^~ to the whole set I.

If M maps xÎ[ 0,1 ] to (s(x),s T(x),s T²(x),...)ÎS^¥, T, and s are linked with the previously defined T and s by sMºs and TMº MT.

Figure 2: Memoryless source, Markov chain of order 1, continued fraction source, heteroclinal source.

Figure 2 displays several types of dynamical sources:

Memoryless sources

We have affine branches of slope 1/p_a on intervals I_a:=(q_a,q_a+1), where q_a=å_i<ap_i.

Markov chains

Each I_a of a memoryless source is divided in r intervals I_a,b, bÎS, of length p_ab=p_[b|a]· p_a on which T:I_a,b®I_b has slope p_a/p_ab=p_b/p_[b|a]·1/p_a. Notice that when the order d of the Markov chain goes to infinity in a certain sense, one obtains at the limit a source with unbounded memory.

Continued fractions

With S=N, I_a:=(1/a+1,1/a), T(x)=1/x-ë1/xû, and s(x)=ë1/xû, corresponding to a continued fraction source, we obtain a source with unbounded memory.

Heteroclinal sources

A source for which derivatives in different intervals can be of different signs is called heteroclinal. Otherwise the source is homoclinal, like the sources presented before.

2.3 Ruelle operators, multi-secants and prefix probabilities

In the context of dynamical systems, with transformations T of local inverses h_a are associated a transfer operator,

[f](x):=

aÎS

h_a'(x)

f° h_a(x),

whose interest lies in the following property: if X is a random variable with density function f, then the density of T(X) is G[f]. The Ruelle operator generalizes it by introducing a complex parameter s, interpreted in statistical physics as the temperature:

_s[f](x):=

aÎS

h_a

(x)^s f° h_a(x).

To deal with probabilities of prefixes of words p_w and hence with fundamental intervals, we have to replace tangents with secants H[h](x,y):=|h(x)-h(y)/x-y|, leading to a first generalization G_s of the Ruelle operator, acting on functions L of two complex variables:

_s[L](x):=

aÎS

H_a

^s[h_a](x,y) L

(

h_a(x),h_a(y)

)

To deal with conditional probabilities, we have to resort to a further generalization G_s of the Ruelle operator involving multisecants instead of secants:

_s^[m][L]:=

aÎS

H_s^[m][h_a] L° V[h_a],

where the multisecants are defined by H_s^[m][h](x,y,z,t)=H[h]^s-m(x,y)H[h]^m(z,t), and V by V[h](x,y,z,t)=(h(x),h(y),h(z),h(t)).

Let F be the distribution associated with the initial density f of a source (S,f). The probability p_w that a word begins with some prefix w is |F(h_w(0))-F(h_w(1))|. For the special case F=Id, it will be denoted p_w^*. Let Q:=H[F] be the secant of the initial distribution. Then the quasi-inverses of G_s and G_s^[k] are related to Dirichlet series in the following way:

L(s)=

wÎM^*

p_w^s=(Id-G

_s)^-1[Q^s](0,1); L^[k](s)=

iÎS

(

Id-G_s^[k]

)

^-1

[

H_s^[k][F]

]

(

0,1,h_i(0),h_i(1)

)

Thanks to a theorem similar to the Perron--Frobenius theorem, we have the decomposition

(Id-G_s)^-1 =

l(s)

1-l(s)

P_s+(Id-N_s)^-1 ),

and a similar decomposition for the multi-secant operator. We deduce the asymptotics:

lim

s®1

(s-1)(Id-G_s)^-1[L](x)=

-1

l'(1)

Y₁(x)

ó
õ

l(t) dt),

where Y₁(x) is an eigenfunction associated with the dominant eigenvalue and chosen according to a proper normalization, and l is the diagonal mapping of L. We get similar results for the L^[m] that also have 1 as pole of order 1, and their respective residues r_m are related to the dominant eigenfunctions Y₁^[m] of the operators G₁^[m], which allows us to find the singular expansion

L(s)=L^[1](s)~

-1

l'(1)(s-1)

+C(S),

where C(S) is a constant depending on the source S and the initial density f. Using the equalities (1) and (2) we can then get asymptotics for L_S(1) and L_L(1).

3 Results: Asymptotics

3.1 General expressions

Let h(S) = -l'(1) = lim_l®¥å_wÎM^lp_w^*|logp_w^*| be the entropy of fundamental intervals and, besides C(S) encountered before, define the constants

C₁(S)

k³2

k(k-1)

K^[k](S

) =1-

lim

l®¥

wÎM^l

p_w^*

wÎM^l

(

1-p_[i|w]^*

)

log

(

1-p_[i|w]^*

)

C₂(S)

k³1

K^[k+1](S

) =

lim

l®¥

wÎM^l

p_w^*

wÎM^l

p_[i|w]^*

log

(

1-p_[i|w]^*

)

For random tries built from n words emitted by a source S, asymptotics of expectations are given in Table 3.

Size of Tr S(n)»1/h(S)n

Path Length of Tr L(n)~1/h(S)nlogn +(C(S)-g/h(S))n

Size of PaTr S_P(n)»1/h(S) (1-C₁(S))n

Path Length of PaTr L(n)~1/h(S)nlogn +(C(S)-g+C₂(S)/h(S))n

Table 3: Asymptotics of expectations.

3.2 Example

For a memoryless source with probabilities {p_i}:

h(S)

iÎM

p_i

logp_i

C(S)

iÎM

p_ilog² p_i

æ
ç
ç
è

iÎM

p_ilogp_i

ö
÷
÷
ø

C₁(S)

iÎM

(

1-p_i

)

log(1-p_i)

C₂(S)

iÎM

p_i

log(1-p_i)

Similar formulae are available for Markov chains and continued fraction sources. Simulations are in agreement with theory.

4 Conclusion and Open Questions

For the average value of the size, a Patricia trie turns out to be better than a trie, and Rényi's condition is not necessary. For the average value of the path length, there is only a correcting term C₂ of order 2, and our proofs made use of Rényi's condition. An open question (see [1] for details) would be to know whether this correcting term remains valid for sources for which Rényi's condition does not hold, although all the natural sources we are aware of do satisfy that condition.

References

[1]: Bourdon (Jérémie). -- Size and path length of Patricia tries: dynamical sources context. Random Structures & Algorithms, vol. 19, n°3-4, 2001, pp. 289--315. -- Special issue ``Analysis of Algorithms'' dedicated to Don Knuth.
[2]: Clément (J.), Flajolet (P.), and Vallée (B.). -- Dynamical sources in information theory: a general analysis of trie structures. Algorithmica, vol. 29, n°1-2, 2001, pp. 307--369.
[3]: Jacquet (Philippe) and Szpankowski (Wojciech). -- Analytical de-Poissonization and its applications. Theoretical Computer Science, vol. 201, n°1-2, 1998, pp. 1--62.
[4]: Vallée (Brigitte). -- Dynamical sources in information theory: fundamental intervals and word prefixes. Algorithmica, vol. 29, n°1-2, 2001, pp. 262--306.

This document was translated from L^AT_EX by H^EV^EA.

Size of Tr	S^{^}(n)=å_wÎS^*( 1-(1+(n-1)p_w)(1-p_w)^n-1 )
Path Length of Tr	L^{^}(n)=å_wÎS^*np_w( 1-(1-p_w)^n-1 )
Size of PaTr	S_P^{^}(n)=å_wÎS^*( 1-(1-p_w)ⁿ-å_iÎS( (1-p_w(1-p_[i\|w]))ⁿ-(1-p_w)ⁿ ) )
Path Length of PaTr	L_P^{^}(n)=å_wÎS^*np_w( 1-(1-p_w)^n-1-å_iÎSp_[i\|w]( 1-p_w(1-p_[i\|w]))^n-1 )

Size of Tr	S^*(s)=-L(-s)(s+1)G(s)
Path Length of Tr	L^*(s)=-L(-s)G(s+1)
Size of PaTr	S_P^*(s)=G(s)L_S(-s)
Path Length of PaTr	L_P^*(s)=-G(s+1) (L(-s)+L_L(-s))

Size of Tr	S(n)»1/h(S)n
Path Length of Tr	L(n)~1/h(S)nlogn +(C(S)-g/h(S))n
Size of PaTr	S_P(n)»1/h(S) (1-C₁(S))n
Path Length of PaTr	L(n)~1/h(S)nlogn +(C(S)-g+C₂(S)/h(S))n