Divide-and-conquer
Complexity and recurrences
D&C logo

Philippe Dumas's home page
Divide-and-conquer | Maple
Publications | Address | Version française

hline

Introduction
A fundamental idea
Complexity and recurrences
2-rational sequences
Asymptotic behavior
References

In each example which we have just met (cf. A fundamental idea) a natural concept of size of the data occurs. The size of a list is its length; the size of a polynomial is its degree; the size of a finite set of points is the number of its points. In addition a concept of cost of an algorithm also appears naturally. For a sorting algorithm it will be the number of comparisons or assignments carried out; for an algorithm of multiplication of matrices or polynomials it will be the number of scalar multiplications carried out. This cost depends on the data but it is normal to see it as a simple function of the size of the data; this reduction brings us to distinguish the cost in the worst case, the average cost and sometimes the cost in the best case.

Divide-and-conquer recurrences. Let us consider again mergesort; the data is a list of size n and the cost is the number of comparisons. According to the divide-and-conquer strategy the cost for a data of size n is the sum of the cost for a data of size $\lfloor n/2\rfloor$ , and of the cost for a data of size $\lceil n/2\rceil$, plus the cost of the merge. In the worst case the merge requires n-1 comparisons, as one sees by considering the two sorted lists $(1,2,\ldots,k-1,n)$ and $(k,\ldots,n-1)$ with $k=\lfloor n/2\rfloor$. The cost Tn of mergesort for a data of size n in the worst case thus satisfies the recurrence

\begin{displaymath} T_n=T_{\lfloor n/2\rfloor}+T_{\lceil n/2\rceil}+n-1\quad\text{for $n\geq2$},\qquad T_0=T_1=0. \end{displaymath}

One can show  [16] that the cost Un of mergesort in the average case satisfies

\begin{displaymath}
U_n=U_{\lfloor n/2\rfloor}+U_{\lceil n/2\rceil}+n-
\frac{\lf...
 ...floor n/2\rfloor+1}
\quad\text{for
$n\geq2$},\qquad U_0=U_1=0. \end{displaymath}

This formula is established under the assumption that the keys to be sorted are the integers betweeen 1 and n and that all the permutations of these integers are equally likely.

Recurrences of the preceding type with respect to n and n/2 can be named divide-and-conquer recurrences. However, before going further, it should be noticed that divide-and-conquer algorithms do not necessarily lead to divide-and-conquer recurrences. For example the average cost Cn of quicksort for a list of size n satisfies [25, p. 120], [33, p. 129]

\begin{displaymath}
C_n=(n+1)+\frac{2}{n}\sum_{0\leq k<n}C_k\quad\text{for $n\gt 1$},\qquad
C_0=C_1=0.\end{displaymath}

In the sequel, we concentrate on divide-and-conquer recurrences.

Generating series. To formalize divide-and-conquer sequences it is more pleasant to consider the ordinary generating series

\begin{displaymath}
f(z)=\sum_{n\geq0}f_nz^n\end{displaymath}

of such a sequence (fn). Moreover it is convenient to make the following convention: a sequence (fn) is extended to all the real numbers by the formula

fx=fn if x is a natural integer n, fx=0 otherwise.

Two operations on sequences are the shift and the homothety

\begin{displaymath}
(f_n)\longmapsto (f_{n-1}),\qquad
(f_n)\longmapsto (f_{n/2})\end{displaymath}.

They are translated into the linear operations on the associated formal power series

\begin{displaymath}
f(z)\longmapsto zf(z),\qquad
f(z)\longmapsto f(z^2).\end{displaymath}

A formal series f(z) is called Mahlerian if it satisfies an equation

\begin{displaymath}
c_0(z)f(z)+c_1(z)f(z^2)+\dotsb+c_N(z)f(z^{2^N})=0\end{displaymath}

in which the ck(z) are polynomials, not all zero. The translation in terms of sequences provides the definition of Mahlerian sequences; such a sequence (fn) satisfies a recurrence of the form

\begin{displaymath}
\sum_{\ell}c_{0,\ell}f_{n-\ell}+\sum_{\ell}c_{1,\ell}f_{(n-\ell)/2}+\dotsb
+\sum_{\ell}c_{N,\ell}f_{(n-\ell)/2^{\ell}}=0,\end{displaymath}

In this formula the sums comprise only a finite number of terms and the coefficients $c_{k,\ell}$ are not all zero. Mahlerian series are thus named after Kurt Mahler, who studied the number bn of binary partitions from the asymptotic point of view [29]. A partition of the integer n is binary if all its summands are powers of 2; for example

27=1+1+1+2+2+4+8+8

is a binary partition of 27. The sequence (bn) satisfies the recurrence

\begin{displaymath}
b_n=b_{n-1}+b_{n/2},\qquad b_0=1,\end{displaymath}

or

\begin{displaymath}
b_{2k}=b_{2k-1}+b_k,\qquad b_{2k+1}=b_{2k},\qquad b_0=1.\end{displaymath}

In an equivalent way its generating series

\begin{displaymath}
b(z)=\prod_{k\geq0}\frac{1}{1-z^{2^k}}\qquad\text{satisfies}\qquad
(1-z)b(z)=b(z^2).\end{displaymath}

Mahler also studied transcendency problems for series like

\begin{displaymath}
f(z)=\sum_{k\geq0}z^{2^k}\qquad\text{which satisfies}\qquad
f(z)=z+f(z^2).\end{displaymath}

One could generalize the definition by considering equations relating f(z) and f(zB), where B is an integer at least equal to 2 but the qualitative results are of the same type as for B= 2 and the case B= 2 is by far the most common in the applications.

Thus a class of series is defined, which satisfies the following closure properties:

--
the rational series are Mahlerian;
--
if a series satisfies an equation with a right member

\begin{displaymath}
c_0(z)f(z)+c_1(z)f(z^2)+\dotsb+c_N(z)f(z^{2^N})=b(z)\end{displaymath}

in which b(z) is Mahlerian, it is itself Mahlerian;
--
the sum of two Mahlerian series is Mahlerian;
--
the Cauchy product of two Mahlerian series is Mahlerian;
--
the derivative of a Mahlerian series is Mahlerian.
Also a Mahlerian series has a strictly positive radius of convergence and thus defines an analytic function.

A series f(z) is of the divide-and-conquer type if it satisfies an equation with a right member

\begin{displaymath}
c_0(z)f(z)+c_1(z)f(z^2)+\dotsb+c_N(z)f(z^{2^N})=b(z)\end{displaymath}

in which b(z) is a formal series and ck(z) are polynomials not all zero. A sequence is of the divide-and-conquer type if its generating series is of the divide-and-conquer type.

Applicability. Divide-and-conquer series appear in various fields. The example of binary partitions shows that they occur in combinatorics. In the same orbit one notices examples from combinatorics on words: the correlation of two words u and v is a word on the alphabet $\{0,1\}$ obtained while gliding the second word along the first, and by noting coincidence in the common part; in particular the autocorrelation of a word indicates its repetitions. This concept occurs in the study of the number of occurrences of a pattern in a text or, on the contrary, in the study of words with prohibited patterns [23]. One shows that the generating series for the enumeration of words with a given autocorrelation is Mahlerian [22].

These series also occur in the problems involved in the binary expansion of the integers. A traditional example is the sequence (sn) which gives the number of 1s in the binary expansion of the integers between 1 and n [11].

Of course the divide-and-conquer strategy provides a lot of examples. For example the multiplication of two polynomials of degree n uses three multiplications of polynomials with degrees half of n and roughly 3n additions of coefficients. Besause the cost is an increasing function of n, a sequence which gives an upper bound for the cost of the multiplication of two polynomials of degree n is the sequence (Pn) defined by [26, p. 278]

\begin{displaymath}
P_n=3P_{\lfloor n/2\rfloor}+cn,\qquad P_0=1,\end{displaymath}

where c is a constant which translates the cost of the addition of two numbers. The generating series P(z) satisfies

\begin{displaymath}
P(z)=3(1+z)P(z^2)+\frac{cz}{(1-z)^2}.\end{displaymath}

Such an equation can be solved by iteration.

hline

Previous page: A fundamental idea
Next page: 2-rational sequences