Variations on Computing Reciprocals of Power Series
Arnold Schönhage
Institut für Informatik, Universität Bonn (Germany)
Algorithms Seminar
February 5, 2001
[summary by Ludovic Meunier]
A properly typeset version of this document is available in
postscript and in
pdf.
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
Abstract
Fast algorithms for polynomial division with remainder are key tools
in computer algebra. The power series domain defines a suitable
framework where such algorithms can be efficiently constructed. While
revisiting Kung's article [5], Arnold Schönhage discusses
algebraic complexity bounds for the computation of reciprocals of
power series and describes a new algorithm for this task involving
Graeffe's root squaring steps.
1 Introduction
By means of Newton's iteration, reciprocals of power series modulo
x^{n+1} can be computed with complexity O(M(n)), where
M(n) denotes the complexity of multiplication (see, e.g.,
[6] for a survey). However, the BachmannLandau
Onotation hides a multiplicative constant, which needs to be
investigated, for instance in order to determine crossover points
when a collection of algorithms is available.
Section 2 sets the required background by
recalling a few definitions from algebraic complexity. Section
3 presents an algorithm for computing
reciprocals of power series, while discussing complexity
bounds. Section 4 describes a new algorithm and
its implementation over Z.
2 Algebraic Complexity
Let F be a field and let A(x) = å_{i ³ 0} a_{i} x^{i} Î
F[[x]] denote a formal power series of the indeterminate x. Here,
formal means that convergence matters are out of concern. Let
D = F(a_{0},a_{1},...) define a domain where a_{i}'s are
regarded as indeterminates. If D is endowed with the four
arithmetic operations (+, , *, /) and a scalar
multiplication, then an algorithm that inputs the power series A(x)
consists of a finite sequence of operations in D. Counting these
operations defines the algebraic complexity, which is an intuitive way
of reflecting performances of the algorithm. Two models of complexity
are worth considering. The arithmetic complexity, denoted by
L,^{1} charges one unit of cost for each operation in
D, while the nonscalar complexity, denoted by C, only
counts nonscalar multiplications and divisions.
3 Kung's Algorithm Revisited
The underlying algorithm used for the accurate cost calculation is
based on Newton's iteration for reciprocals, as discussed by Kung
in [5].
3.1 Kung's algorithm
Let R(x) be the reciprocal of the unit A(x) with respect to the
field D[[x]]. Define the function f from the subdomain of
D[[x]] whose elements have nonzero constant term to D[[x]]
by f(s)=s^{1}A(x). Thus R(x) is just the zero of f.
Newton's iteration is a secondorder iteration^{2} and consists of a linear approximation of f.
Newton's iteration function N is given by:
N(s)=s 

=s 
( 
2A(x)s 
) 
,
(1) 
where f' denotes the derivative of f, which is defined
algebraically (see [8]). Let n be a power of two
and

A_{2n}(x) 
= A(x) modx^{2n+1}, 








(2) 
R_{n}(x) 
= 1 / A(x) modx^{n+1}.









(3) 

Newton's iteration features a quadratic convergence
(see [3, Chap. 4]): the number of accurate terms doubles
at each iteration. This may be expressed by
R_{2n}(x) = N 
( 
R_{n}(x) 
) 
modx^{2n+1}.
(4) 
From (2) and (3), there exists a
polynomial P of degree at most n1 such that
R_{n}(
x)
A_{2n}(
x)= 1 +
x^{n+1} P(
x) mod
x^{2n+1}.
(5)
Combining (1), (5) and
the expansion (4) leads to a recursive formula
that computes the reciprocal of A(x) modulo x^{2n+1}:


= R_{2n}(x) = R_{n}(x) 
( 
1  x^{n+1}P(x) 
) 
modx ^{2n+1}.
(6) 
Equations (5) and (6) both charge
M(n)+O(n) units of cost. Therefore, the overall arithmetic
complexity of Kung's algorithm is bounded by
L(2
n)
£ L(
n) + 2
M(
n) +
O(
n).
(7)
Unfolding this reccurrence leads to L(n) = O(M(n)) for all
known multiplications.
The derivation of the exact arithmetic complexity from
(7) depends on a specific algorithm for
multiplication of polynomials. The next section describes a
multiplication algorithm involving fast Fourier transfrom
(FFT). Originally, Kung derived (7) for
nonscalar complexity, where M(n)=2n+1, and found C(n)<4n.
Actually, the lowest upper bound presently known for the nonscalar
complexity is C(n)<3.75 n. Kalorkoti derived this latter result from
Kung's thirdorder iteration [4] and taking advantage
that squaring modulo x^{n+1} is less expensive than multiplying
modulo x^{n+1} (see [2, Chap. 2]).
3.2 FFT and fast multiplication
The Npoint FFT defines a ring isomorphism from the quotient
F[[x]]/(x^{N}) to F^{N}. It is an evaluationinterpolation map
where the evaluation points, also called Fourier points, are the Nth
roots of unity. Actually, the FFT is the evaluationinterpolation map
whose implementation yields the lowest known complexity. Indeed, the
symmetry properties of the Nth roots of unity allow a
divideandconquer implementation
[3, Chap. 4]. The arithmetic complexity of Npoint FFT is
bounded by L(N) £ 3/2 N logN  N + 1
(see [2, Chap. 2]).
The FFT performs fast back and forth conversions from an evaluated
form to its interpolated form. Thus, low complexity algorithms can be
achieved by taking advantage of each representation. In particular,
fast multiplication consists in converting both operands into their
evaluation forms with two FFTs, performing a coefficientwise
multiplication, and delivering the result with one backward
FFT. Schönhage shows that multiplication of polynomials of degree n
(some restrictions on n are needed and discussed later) according to
this method has algebraic complexity
M(n) = 
( 
9 + o(1) 
) 
n logn.
(8) 
3.3 Kung's algorithm revisited
Direct substitution of (8) into
(7) leads to
L(n) £ 
( 
18 + o(1) 
) 
n logn.

However, Schönhage obtains a lower multiplicative constant by
deferring the last backward FFT. R_{n} and A_{2n} are first
converted into their evaluation forms, requiring two direct Npoint
FFTs, which cost 2 L(N). Then, steps (5) and
(6) compute the evaluation form of R_{n}P,
involving two coefficientwise multiplications and two subtractions,
which add 4N units of cost. One ultimate backward Npoint FFT
interpolates R_{n}P with L(N) operations. Therefore,
(7) becomes
L(2n) £ L(n) + 3 L(N) + 4N.
A typical value for N is the lowest power of two that is greater
than d = deg(R_{n}(x) A_{2n}(x)) = 3n. However, a
significant overhead is expected when d is slightly greater than the
nearest power of two. In this case, the arithmetic complexity for the
Npoint FFT is L(N) < 3 d log(2d). Thus, Schönhage suggests
for N a scaled power of two of the form N = c 2 ^{n}, where
n = é log(d) ù  ë loglog(d+1) û and c
= é d / 2 ^{n} ù. This latter choice for N yields a
lower bound
L(N) £ d 
( 
3/2 log(d) + 13/5 loglog(d+1) + O(1) 
) 
.

This precise count yields the arithmetic complexity for reciprocals
L(n) £ 
( 
27/2 + o(1) 
) 
n logn.

Surprisingly, Newton's thirdorder iteration does not yield a better
bound for arithmetic complexity, as opposed to the case of nonscalar
complexity (see Section 3.1).
4 A New Algorithm over Z
Algorithms for division of polynomials reduce the division task to
multiplications. However, while featuring an attractive asymptotic
complexity, such reductions may involve detours and tricks whose
implementations lead to tremendous multiplicative constants. Indeed,
earlier algorithms for division of polynomials shared this
drawback. Therefore, Schönhage suggests a new fast algorithm by means
of Graeffe's root squaring with a low constant and ready for an
immediate implementation due to its extreme simplicity.
4.1 Graeffe's root squaring method
Graeffe's squaring method originates in numerical analysis for solving
polynomial equations [1]. This method proceeds from any
polynomial A(x) in F[x] to the even polynomial G(x^{2}) =
A(x)A(x).
In F[[x]] the reciprocal of A(x) modulo x^{n+1} may be written
as
In equation (9), the denominator of the right
handside contains at most n+1 terms, but only half of them are
significant when computing modulo x^{n+1}. Therefore, Graeffe's rule
reduces the task of inverting n+1 terms to a halfsized
problem. Thus, the corresponding algorithm works recursively as
follows (notations are those of (2) and
(3)). With k = ë n / 2 û, Graeffe's
step computes
G_{k}(x^{2}) = A_{n}(x)A_{n}(x) modx^{n+1},
charging at most n+1 nonscalar units of cost. Indeed, typically,
nonscalar complexity for such a multiplication is C(n) = 2n+1 (see
[2, Chap. 2]). However, the polynomial A_{n} may be
rewritten as
A_{n}(x) = A_{n}^{(even)}(x^{2}) + x A_{n}^{(odd)}(x^{2}),
which shows that both A_{n}(x_{0}) and A_{n}(x_{0}), for any
x_{0} lying in the ground field, can be computed together as follows
A_{n}(± x_{0}) = A_{n}^{(even)}(x_{0}^{2}) ± x_{0} A_{n}^{(odd)}(x_{0}^{2}).
Therefore, Graeffe's step requires at most n+1 essential
multiplications, by evaluation of A_{n} for n+1 distinct
squares. The reciprocal of G_{k}(x) modulo x^{k+1}, denoted by
H_{k}(x), is determined by recursive calls. An ultimate
multiplication
R_{n}(x) = A_{n}(x)H_{k}(x^{2}) modx^{n+1}
delivers the result, charging extra n+2k+1 units of nonscalar
cost. Then, the nonscalar complexity is bounded by C(n) £ 6n + 2
log(n/2), which is slightly weaker than Kalorkoti's (see Section
3.1) but the implementation of Graeffe's
approach is straightforward.
4.2 Application to reciprocals over Z
This section deals with units of the ring Z[[x]] of the form A(x)
= 1 + å_{i>0} a_{i} x^{i}. This form naturally arises with
divisions by monic polynomials computed via the substitution x
® 1/x.
Basically, the implementation of Graeffe's method consists in mapping
polynomials to integers expressed in some radix r_{0} notation, so
that multiplication of integers can be used. This idea is based on
Kronecker's trick of encoding polynomials with bounded coefficients in
a single integer. Let f_{r0} be a ring morphism from
Z_{n}[x] (i.e., polynomials of Z[x] of degree less than n)
to Z that evaluates polynomials at r_{0} Î N. If there
exists a constant b such that  a_{i}  < b^{i} holds for
each i>0, then the bit size of the coefficients of R and G can
be bounded. Thus, under this assumption, r_{0} Î N can be chosen
such that the evaluation map f_{r0} is a bijection and N can
be optimally determined. The arithmetic complexity can easily be
derived
L(n) = 6 M(t n^{2}),
where t = log(3 b) and where the SchönhageStrassen
algorithm for multiplication of integers, which features the lowest
known complexity M(m) = O(m log(m) loglog(m))
[7], is likely to be used.
References
 [1]

Bareiss (Erwin H.). 
Resultant procedure and the mechanization of the Graeffe process.
Journal of the ACM, vol. 7, 1960, pp. 346386.
 [2]

Bürgisser (Peter), Clausen (Michael), and Shokrollahi (M. Amin). 
Algebraic complexity theory. 
SpringerVerlag, Berlin, 1997, xxiv+618p. With the
collaboration of Thomas Lickteig.
 [3]

Geddes (K. O.), Czapor (S. R.), and Labahn (G.). 
Algorithms for computer algebra. 
Kluwer Academic Publishers, Boston, MA, 1992,
xxii+585p.
 [4]

Kalorkoti (K.). 
Inverting polynomials and formal power series. SIAM Journal on
Computing, vol. 22, n°3, 1993, pp. 552559.
 [5]

Kung (H. T.). 
On computing reciprocals of power series. Numerische
Mathematik, vol. 22, 1974, pp. 341348.
 [6]

Salvy (Bruno). 
Asymptotique automatique. 
Research Report n°3707, Institut National de Recherche en
Informatique et en Automatique, 1999. 20 pages.
 [7]

Schönhage (A.) and Strassen (V.). 
Schnelle Multiplikation grosser Zahlen. Computing (Arch.
Elektron. Rechnen), vol. 7, 1971, pp. 281292.
 [8]

van der Waerden (B. L.). 
Modern algebra. 
Frederick Ungar Publishing Co., New York, N. Y., 1949,
vol. I, xii+264p.
 1
 For notational convenience, arithmetic complexity is
also denoted by M (resp. L) for multiplication (resp. fast
Fourier transform).
 2
 Thirdorder
iteration is mentioned later and consists of a parabolic
approximation.
This document was translated from L^{A}T_{E}X by
H^{E}V^{E}A.