On Tree-Growing Search Strategies
Hosam M. Mahmoud
Algorithms Project, INRIA Rocquencourt
and The George Washington
University
Algorithms Seminar
January 5, 1998
[summary by Michèle Soria]
A properly typeset version of this document is available in
postscript and in
pdf.
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
Abstract
A search algorithm that adds a key to a sorted file can be represented
by a deterministic tree whose external nodes are equally likely
targets for insertion. The collection of algorithms that one uses
throughout the stages of insertion sort is called a search
strategy. Using the concept of ``tree-growing'' strategy, we
demonstrate that most practical algorithm have a normal behavior.
We present a sufficient condition for normality of tree-growing
strategies. The sufficient condition specifies a relationship between
the overall variance and the rate of growth in height of the sequence
of trees that the search strategy ``grows''.
Insertion sort is a well-known on-line sorting algorithm: at each stage
of the sorting, the elements obtained so far make up a sorted array; when
reading a new element, the algorithm searches for its proper position in
the array and inserts it. The searching may be done by any method (linear,
binary, etc) and the methods may be different from one stage to another.
At each stage, given a searching
strategy, the positions of probes
(positions for comparisons between the new element
to be inserted and elements of the current array) is represented by a
binary decision tree. The first probe is the root of the
decision tree, the two positions of the second probe, at most one on each side
of the first probe, become the children of the root and all internal nodes
are constructed so forth. The leaves of
the tree correspond to the places where new elements are to be inserted. The
root-to-leaf paths of the tree thus represent the possible probe sequences of
the searching algorithm.
1 Tree-growing and normal strategies
Let Si denote the searching algorithm at stage i, with corresponding
decision tree Ti; the collection of searching algorithms
S = {Si}i=1¥, or equivalently the collection of
corresponding decision trees T = {Ti}i=1¥,
will be called
a search strategy. A search strategy is tree-growing if, for each
positive integer i, the shape of Ti+1 is obtained by replacing a leaf
of Ti by an internal node (with two hanging leaves).
We analyse tree-growing search strategies under the assumption of uniform
distribution of the leaves at each stage. Let the random variable Cn be the
total number of times S compares a new element
to a probe during the sorting
of the first n elements. The class of normal search strategies is
composed of the strategies for which Cn, once normalized,
converges in distribution to the normal law, i.e.
The following lemma gives a sufficient condition for a tree-growing strategy
to be normal. This condition relates the variance of Cn to the height of the
decision trees of the search strategy. We denote by hn
the height of tree Tn,
and by sn2 the variance of Cn.
Lemma 1
If hn=o(sn) then S is a normal strategy.
Proof.
Cn is the sum of n random variables (Xi)i=1,...,n, where Xi
denotes the number of comparisons made by Si. Since each insertion is
performed independently of all others, we assume the Xi's to be independent
random variables. The proof of Lemma 1 is a technical verification of
Lindeberg's condition, which ensures normality:
Practically for a given strategy, the difficulty lies in computing the
variance of Cn, which is the sum of the variances of the Xi's.
The most commonly used strategies, linear search and binary search, satisfy
the condition of Lemma 1. When linear search is used at every stage, it is easy
to show that Cn has average value asymptotic to n2/4, and variance
asymptotic to n3/36, whereas hn equals n-1. For binary repeated search
strategies, one can easily show that hn is asymptotic to log n and
the average value of Cn
is equivalent to nlog2 n, but the computation of the variance is more
intricate and finally leads to sn2=n A(n) + O(log n), where A(n)
is an oscillating function of bounded magnitude.
There exists tree-growing search strategies which are not normal.
In [2], the
authors exhibit a strategy that does not satisfy the condition of Lemma 1,
and can be shown to be non normal by Feller-Lindeberg condition
(see [1, vol. 2, §XV.6]). To ensure normality, some further conditions,
which are presented in the next
section,
are required
on the decision trees of the search strategy.
2 Normality of consistent strategies
This section identifies a subclass of tree-growing strategies, the
consistent strategies, which are proved to be normal.
Let T = {Ti}i=1¥ be the collection of
decision trees corresponding to a search strategy S. For
each Ti, we denote by TLi its left subtree (with size nLi) and TRi its right subtree (with size nRi). The size of the smaller subtree is noted by
g(i)=min(nLi, nRi) .
A search strategy S is said to be self-similar if
for each decision tree, its left and right subtrees belong to T, that is TLi=TnLi and
TRi=TnRi (where trees are considered as equal if they have
the same shape).
And S is said to be well-proportioned if the proportion
of nodes belonging to the smaller subtree approaches a limit as i
tends to infinity, that is limi® ¥ g(i)/i exists.
Finally we call consistent a strategy which is tree-growing,
self-similar and well proportioned.
Many usual strategies, such as linear search (g(i)/i® 0) or
binary search (g(i)/i® 1/2), are
consistent.
Theorem 1
All consistent strategies are normal.
The proof of this theorem relies on three properties of search
strategies decision
trees that
ensure the sufficient
condition for normality stated in Lemma 1.
Property 1
For each positive integer i, the decision tree Ti
has at least one external node on
each unsaturated level.
Property 2
Let mk be the number of decision trees with height k, the
sequence {mk}k=1¥ is nondecreasing in k.
For consistent strategies, these two properties result from
self-similarity and tree-growing of the decision trees.
Property 3
The variance of Cn satisfies sn2=W (n).
This property holds true for any decision tree, the intuition being
that the tree with smallest variance is the complete binary tree (all
levels saturated, except possibly the last one), which is the decision
tree associated with binary search.
The proof of Theorem 1 considers two cases, g(i)/i® 1/2 and
lim g(i)/iÎ [0, 1/2). In the first case the strategy is similar
to binary search, hn=o(sn) and the result follows from
Property 3.
In the second case Properties 1 and 2 are used to show that
hn=o(sn).
3 Relaxed conditions for normality
The sufficient condition for normality stated in Lemma 1 holds true
for some families of tree-growing search strategies that are not
consistent. For example, Fibonaccian search, where Fibonacci numbers
are used to indicate the next probe, is not consistent since lim
g(i)/i does not exist; but it can be shown to be normal with a proof
similar to the one of Theorem 1, since liminfn®¥ g(n)/n
as well as limsupn®¥ g(n)/n stand in (0,1/2).
More generally, one can exhibit different conditions on search
strategies, that lead to normality by showing that the heights of the
decision trees grow at a steady rate. For example
Proposition 1
Any tree-growing search strategy S for which
liminfn®¥ g(n)/n belongs to (0,1/2) is normal.
Proposition 2
If mk=W (k1+e) for some e >0, and
Var[Xn]= W (1), then the corresponding
tree-growing search
strategy is normal.
References
- [1]
-
Feller (William). --
An introduction to probability theory and its applications. --
John Wiley & Sons Inc., New York, 1971, second
edition, vol. II, xxiv+669p.
- [2]
-
Lent (Janice) and Mahmoud (Hosam M.). --
On tree-growing search strategies. The Annals of Applied
Probability, vol. 6, n°4, 1996, pp. 1284--1302.
This document was translated from LATEX by
HEVEA.