Relax But Don't Be Too Lazy
Joris van der Hoeven
Laboratoire de Mathématique, Université Paris-Sud
January 24, 2000
[summary by Paul Zimmermann]
A properly typeset version of this document is available in postscript
and in pdf
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
Joris van der Hoeven's talk presents novel algorithms operating on
formal power series. These new algorithms are based on fast
multiplication methods (Karatsuba, Toom--Cook, FFT), and improve the
best asymptotic complexities known, for example those obtained by
Brent and Kung , while staying very efficient for the
medium range (Karatsuba).
Most algorithms work with a linear space in the input size n, some
of them require a space in nlog n. The basic idea of these new
algorithms is what Joris van der Hoeven calls ``the relaxed
approach,'' intermediate between the zealous approach and the lazy
approach. This relaxed approach was invented in 1997, with the
presentation of two relaxed algorithms for the multiplication of
formal power series at the ISSAC'97 conference .
The report  details these algorithms and their
implantation, presents some other multiplication algorithms, shows how
the relaxed approach extends naturally to other operations on formal
power series, and finally offers several experimental comparisons
between classical and relaxed algorithms.
1 The Zealous Approach
Let us consider the product of two formal power series, f = f0 +
... + fn zn and g = g0 + ... + gn zn. The zealous
approach consists in using at the same time every data f0,
..., fn, g0, ..., gn to calculate the product h = f · g
= h0 + ... + hn zn + O(zn+1). So it corresponds to the
classical or ``off-line'' approach. Several algorithms of different
complexities implement this approach: the naïve multiplication in
O(n2), Karatsuba's algorithm in O(nlog2 3) ,
and the multiplication by FFT in O(n log n loglog n). The
following table summarizes the complexity in time and space of the
best known zealous algorithms for different operations on formal power
series (to facilitate the reading, we omitted the O(·) terms):
||M(n) = n log n loglog n
||M(n) log n
||M(n) (n log n)1/2
||n log n
|Composition in finite characteristic
||M(n) log n
Newton's method reduces several operations to elementary computations.
For example, the logarithm of a formal power series is written:
and reduces to a division (f'/f) and an integration of linear
complexity, whence a cost in O(M(n)). Exponentiation
reduces to logarithm by Newton's method. If g is such that log
g-f=O(zn/2), i.e., g is an approximation to order n/2 of
exp f, then g~=g-g(log g-f) will be an approximation to
order n, whence an algorithm again in O(M(n)).
Functional inversion---given a series f, find g such that f°
g=z---reduces to composition by:
and so the complexity of inversion is that of composition.
The problem is as follows: given a polynomial f of degree p, a
polynomial g with zero constant coefficient and of fixed degree q,
and an integer n³ p, compute h=f° g to order n. The
divide-and-conquer algorithm consists in writing:
and so on with p/4, p/8, ..., the powers of g being
precomputed. It gives a complexity of O((pq/n)M(n)log
| f ° g = (f
|| + zp/2 f
||) ° g = f
|| ° g + gp/2 (f
|| ° g),
Given two formal power series
f = f0 + ... + fn zn and g = g1 z + ... + gn zn,
we want to compute h = f ° g = h0 + ... + hn zn + O(zn+1).
Brent and Kung's algorithm 
splits gf into two parts g = glo + ghi:
then writes the Taylor expansion of f ° (glo + s) at s=0:
||= g1 z + ... + gq zq
||= gq+1 zq+1 + ... + gn zn,
The computation of f(n) ° glo can be done by direct iteration:
or inverse iteration:
| f ° g = f ° g
|| + (f' ° g
|| (f'' ° g
||)2 + ··· .
|| f(i-1) ° g
fi-1 + i
|| f(i) ° g
2 The Lazy Approach
Here, we regard the formal power series not as a list of coefficients
given once and for all, but as a flow of coefficients. That
corresponds to ``in-line'' computations. The lazy approach consists
in calculating the coefficients one by one; at each stage, we only
perform strictly necessary computations.
Let us consider for example the equation for the generating
function f(z) of binary trees counted according to their internal
f = 1 + z f2.
Here the zealous or ``off-line'' approach does not apply because the
coefficient fn of order n³1 of f depends on f0, f1,
fn = f0 fn-1 + f1 fn-2 + ... + fn-2 f1 + fn-1 f0.
Thus, for the multiplication at order 3 of f = f0 + f1 x + f2 x2
+ O(x3) by g = g0 + g1 x + g2 x2 + O(x3) giving h = f ·
g = h0 + h1 x + h2 x2 + O(x3), the lazy approach consists in
calculating the value h0 = f0 g0 at stage 0, then h1 = f0 g1
+ f1 g0 at stage 1 and h2 = f0 g2 + f1 g1 + f2 g0 at
stage 2, for a total of 6 multiplications. It is also possible to
represent this computation graphically by the following table, where
the value k at the intersection of line gi and column fj means
that the value fj gi is obtained at stage k:
The major disadvantage of this approach is that the computation of all
coefficients up to order n costs O(n2): we cannot use fast
multiplication algorithms to reduce the complexity.1
Another example is the computation of the exponential g = exp f of
a formal power series. By differentiation, we obtain g' = g ·
f', which reduces the exponentiation to a multiplication (the
differentiation and the integration having linear complexity):
However, here again, the series g appears in both members of the
equation; with the lazy approach, we can calculate the product f' g
one term at a time only, here again giving a quadratic complexity.
The article  by Stephen Watt describes an implementation
in Scratchpad II (former name of Axiom) of that approach, based on a
lazy implementation of formal power series.
In conclusion, the lazy approach has the advantage on the zealous
approach to apply to the case of implicit equations; in return it does
not allow the use of fast multiplication algorithms, and therefore
gives higher asymptotic complexities. It is precisely this drawback
which the relaxed approach solves.
3 The Relaxed Approach
The relaxed approach tries to use fast algorithms from the zealous
approach in cases where this approach is not applicable, i.e., when
``off-line'' computations are not possible, like for example for the
computation of the coefficients of the generating function of binaries
trees f = 1 + z f2, or of the exponential of a series g=ò f'g.
The basic idea is the following: instead of performing the minimal
computations at each stage as in the lazy approach, one performs a few
more calculations at certain stages, which will allow the use of fast
algorithms, and in the end a global gain. As all operations
considered ultimately reduce to multiplications, it is enough to
detail the relaxed approach for the multiplication of formal power
Let us recall the above-mentioned example of the product of f = f0 +
f1 x + f2 x2 + O(x3) by g = g0 + g1 x + g2 x2 + O(x3).
The relaxed algorithm operates in the following way: at stage 1,
instead of calculating h1 by f0 g1 + f1 g0, we obtain it by
Karatsuba's formula (f0 + f1) (g0 + g1) - f0 g0 - f1 g1,
thus with two multiplications as well because f0 g0 = h0 has
already been calculated. However, this already made it possible to
compute part of h2, namely f1 g1. Then stage 2 has to compute
f0g2 and f2g0 only, thus a gain of one multiplication compared
to the lazy approach. The corresponding table is the following:
where the square formed by the `0' and three `1' is obtained in three
multiplications instead of four, thanks to Karatsuba's algorithm.
Considering differently, we cut out the triangle of side 3 in two
squares 1×1 and a square 2×2, for which we used a fast
algorithm. More generally, any relaxed algorithm for the
multiplication of formal power series of order n consists of a
tiling of the triangle of side n by a set of squares. With each
tiling corresponds a new algorithm. Each square is numbered by an
integer from 0 to n, indicating the stage at which it is calculated;
at stage n, only the coefficients of order less than or equal to n
can be used.
The example above illustrates two significant points of relaxed
Thus we can distinguish two cases:
(i) the case where the maximum order n of calculations is
known in advance and thus it is a question of optimizing the total
number of operations up to this order n;
(ii) the case where the maximum order is not known a
priori, and one wants to optimize the ``average'' number of
operations of the relaxed algorithm.
at the end of stage 1, it is necessary to save the value
of f1g1 which was computed in advance, for latter use at stage 2.
The relaxed algorithms may thus require more memory than zealous
algorithms. In most cases however, the memory used remains linear,
but it can be in nlog n;
- if we want to continue the calculation of h=fg to a higher
order, say order 4, the adopted strategy is not necessarily the best.
Indeed, at stage 2 we could have calculated (f0+f2)(g0+g2) - f0
g0 - f2 g2 in two multiplications, which would give f0 g2 + f2
g0 in two multiplications as well, but would also give the
term f2g2 of h4.
Joris van der Hoeven also shows that Karatsuba's algorithm for the
multiplication of polynomials---we
do not speak any more of formal power series here---is essentially
relaxed, i.e., the formula giving the term hk of the product only
depends on f0, ..., fk and g0, ..., gk. Consequently,
Karatsuba's algorithm can directly be used for the relaxed
multiplication. The table corresponding to the product of two
polynomials of degree 3 is the following:2
The major disadvantage of the relaxed alternative of Karatsuba's
algorithm is however the memory usage: on the one hand the memory
required is in O(nlog n), on the other hand the memory management
is extremely complex, since for each stage it is necessary to know
which values must be calculated, which should be reused---and among
those, which can be destroyed---, finally which have to be saved for
Another algorithm proposed by Joris van der Hoeven consists in tiling
the square n× n by a sequence of `L' shapes of increasing
width. That leads to a relaxed multiplication in O(M(n)log
n). Several other alternatives are proposed
in , both for complete products (polynomials) and
truncated products (formal power series). The other operations
(division, composition) are also ``essentially relaxed.'' Finally we
obtain the following complexities for the relaxed alternatives of the
operations on formal power series:
||n log n
|Multiplication via FFT
||D(n) = M(n) log n
||D(n) log n
||D(n) (n log n)1/2
||n3/2 log n
|Composition in finished characteristic
||D(n) log n
||n log n
The time complexities are the same ones as for the zealous approach,
while replacing M(n) with D(n). The memory complexity is
identical, except when we use Karatsuba's multiplication algorithm
(there is however a slower variant by a constant factor, but in
space O(n)), or for the composition (general or in finite
Joris van der Hoeven gives in his report  many
experimental results for these new algorithms. Timings below
correspond to an AMD processor at 200 MHz with 64 MB of main memory.
Van der Hoeven's program calculates 500 terms of the Taylor
expansion of exp(zexp z) in 342 seconds against 1086 seconds for
the zealous approach; it calculates the number of alcohols
CnH2n+1OH for n=5000 in approximately 2300 seconds, whereas
the naïve method does not allow this calculation in reasonable
time and space; it calculates the expansion in 1/x of the
to order 2000 in 1572 seconds.
| f(x) =
Joris van der Hoeven presented us a whole panoply of algorithms which
reduce the calculation of the first n coefficients of the majority
of the formal power series defined by algebraic equations,
differential equations or difference equations, to a quasi-linear
complexity, whereas the best algorithms known before were almost
quadratic (in the implicit case, i.e, where the zealous approach does
It would be nice if these algorithms were implemented in enumerative
combinatorics softwares like combstruct3 or CS . More generally, all computer algebra systems
worthy of the name should implement these new algorithms, both for
formal power series, polynomials, and integers. Indeed, one of the
by-products of Joris van der Hoeven's report is a division algorithm
with remainder in K(n) operations, whereas the best known algorithm
was in 2K(n) [7, 2, 5].
For truncated division and square root, new algorithms based on
Karatsuba's multiplication are
detailed in the report .
People who don't read French may thank Gina Pierrelée-Grisvard who
helped to translate this summary.
Brent (Richard P.) and Kung (H. T.). --
O((n log n)3/2) algorithms for composition and
reversion of power series. In Traub (J. F.) (editor), Analytic
computational complexity (Proc. Sympos., Carnegie-Mellon Univ., Pittsburgh,
Pa., 1975). pp. 217--225. --
Academic Press, New York, 1976.
Burnikel (Christoph) and Ziegler (Joachim). --
Fast recursive division. --
Research Report n°MPI-I-98-1-022, Max-Planck-Institut für
Informatik, Saarbrücken, Germany, October 1998.
Denise (Alain), Dutour (Isabelle), and Zimmermann (Paul). --
CS: a MuPAD package for counting and randomly generating
combinatorial structures. In Formal Power Series and Algebraic
Combinatorics, pp. 195--204. --
1998. Proceedings of FPSAC'98, June 1998, Toronto.
Hanrot (Guillaume), Quercia (Michel), and Zimmermann (Paul). --
Speeding up the division and square root of power series. --
Research Report n°3973, Institut National de Recherche en
Informatique et en Automatique, July 2000. Available from
Jebelean (Tudor). --
Practical integer division with Karatsuba complexity. In
Küchlin (Wolfgang W.) (editor), ISSAC'97 (July 21--23, 1997. Maui,
Hawaii, USA). pp. 339--341. --
ACM Press, New York, 1997. Conference proceedings.
Karatsuba (A. A.) and Ofman (Yu. P.). --
Multiplication of multidigit numbers by automata. Physics
Doklady, vol. 7, 1963, pp. 595--596. --
Translated from Doklady Akad. Nauk, vol. 145, n°2, 1962,
Moenck (R.) and Borodin (A.). --
Fast modular transforms via division. In Proceedings of the 13th
Annual IEEE Symposium on Switching and Automata Theory, pp. 90--96. --
van der Hoeven (Joris). --
Lazy multiplication of formal power series. In Küchlin
(Wolfgang W.) (editor), ISSAC'97 (July 21--23, 1997. Maui, Hawaii,
USA). pp. 17--20. --
ACM Press, New York, 1997. Conference proceedings.
van der Hoeven (Joris). --
Relax, but don't be too lazy. --
Technical Report n°78, Université de Paris-Sud,
Mathématiques, Bâtiment 425, F-91405 Orsay, 1999.
Submitted to the Journal of Symbolic Computation. Available from
Watt (Stephen M.). --
A fixed point method for power series computation. In Gianni
(Patrizia M.) (editor), Symbolic and Algebraic Computation
(International Symposium ISSAC'88, Rome, Italy, July 4-8, 1988). Lecture Notes in Computer Science, vol. 358, pp. 206--217. --
Springer Verlag, 1989. Conference proceedings.
- By the
way, this method is precisely that used in the combstruct
library for the enumeration of combinational structures.
- Exercise: Find
the operations carried out with each stage from 0 to 6 and check
that one indeed performs 9 multiplications. Help: 9=1+2+2+3+1+0+0.
This document was translated from LATEX by