Variations on Computing Reciprocals of Power Series

Arnold Schönhage

Institut für Informatik, Universität Bonn (Germany)

Algorithms Seminar

February 5, 2001

[summary by Ludovic Meunier]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).
Abstract
Fast algorithms for polynomial division with remainder are key tools in computer algebra. The power series domain defines a suitable framework where such algorithms can be efficiently constructed. While revisiting Kung's article [5], Arnold Schönhage discusses algebraic complexity bounds for the computation of reciprocals of power series and describes a new algorithm for this task involving Graeffe's root squaring steps.



1  Introduction

By means of Newton's iteration, reciprocals of power series modulo xn+1 can be computed with complexity O(M(n)), where M(n) denotes the complexity of multiplication (see, e.g., [6] for a survey). However, the Bachmann--Landau O-notation hides a multiplicative constant, which needs to be investigated, for instance in order to determine cross-over points when a collection of algorithms is available.

Section 2 sets the required background by recalling a few definitions from algebraic complexity. Section 3 presents an algorithm for computing reciprocals of power series, while discussing complexity bounds. Section 4 describes a new algorithm and its implementation over Z.

2  Algebraic Complexity

Let F be a field and let A(x) = åi ³ 0 ai xi Î F[[x]] denote a formal power series of the indeterminate x. Here, formal means that convergence matters are out of concern. Let D = F(a0,a1,...) define a domain where ai's are regarded as indeterminates. If D is endowed with the four arithmetic operations (+, -, *, /) and a scalar multiplication, then an algorithm that inputs the power series A(x) consists of a finite sequence of operations in D. Counting these operations defines the algebraic complexity, which is an intuitive way of reflecting performances of the algorithm. Two models of complexity are worth considering. The arithmetic complexity, denoted by L,1 charges one unit of cost for each operation in D, while the nonscalar complexity, denoted by C, only counts nonscalar multiplications and divisions.

3  Kung's Algorithm Revisited

The underlying algorithm used for the accurate cost calculation is based on Newton's iteration for reciprocals, as discussed by Kung in [5].

3.1  Kung's algorithm

Let R(x) be the reciprocal of the unit A(x) with respect to the field   D[[x]]. Define the function f from the subdomain of D[[x]] whose elements have nonzero constant term to D[[x]] by f(s)=s-1-A(x). Thus R(x) is just the zero of f.

Newton's iteration is a second-order iteration2 and consists of a linear approximation of f. Newton's iteration function N is given by:
N(s)=s-
f(s)
f'(s)
=s ( 2-A(x)s ) ,     (1)
where f' denotes the derivative of f, which is defined algebraically (see [8]). Let n be a power of two and
  A2n(x) = A(x) modx2n+1,                     (2)
  Rn(x) = 1 / A(x) modxn+1.                     (3)

Newton's iteration features a quadratic convergence (see [3, Chap. 4]): the number of accurate terms doubles at each iteration. This may be expressed by
R2n(x) = N ( Rn(x) ) modx2n+1.     (4)
From (2) and (3), there exists a polynomial P of degree at most n-1 such that
Rn(x) A2n(x)= 1 + xn+1 P(x) modx2n+1.     (5)
Combining (1), (5) and the expansion (4) leads to a recursive formula that computes the reciprocal of A(x) modulo x2n+1:
1
A(x)
= R2n(x) = Rn(x) ( 1 - xn+1P(x) ) modx 2n+1.     (6)

Equations (5) and (6) both charge M(n)+O(n) units of cost. Therefore, the overall arithmetic complexity of Kung's algorithm is bounded by
L(2n) £ L(n) + 2 M(n) + O(n).     (7)
Unfolding this reccurrence leads to L(n) = O(M(n)) for all known multiplications.

The derivation of the exact arithmetic complexity from (7) depends on a specific algorithm for multiplication of polynomials. The next section describes a multiplication algorithm involving fast Fourier transfrom (FFT). Originally, Kung derived (7) for nonscalar complexity, where M(n)=2n+1, and found C(n)<4n. Actually, the lowest upper bound presently known for the nonscalar complexity is C(n)<3.75 n. Kalorkoti derived this latter result from Kung's third-order iteration [4] and taking advantage that squaring modulo xn+1 is less expensive than multiplying modulo xn+1 (see [2, Chap. 2]).

3.2  FFT and fast multiplication

The N-point FFT defines a ring isomorphism from the quotient F[[x]]/(xN) to FN. It is an evaluation-interpolation map where the evaluation points, also called Fourier points, are the Nth roots of unity. Actually, the FFT is the evaluation-interpolation map whose implementation yields the lowest known complexity. Indeed, the symmetry properties of the Nth roots of unity allow a divide-and-conquer implementation [3, Chap. 4]. The arithmetic complexity of N-point FFT is bounded by L(N) £ 3/2 N logN - N + 1 (see [2, Chap. 2]).

The FFT performs fast back and forth conversions from an evaluated form to its interpolated form. Thus, low complexity algorithms can be achieved by taking advantage of each representation. In particular, fast multiplication consists in converting both operands into their evaluation forms with two FFTs, performing a coefficient-wise multiplication, and delivering the result with one backward FFT. Schönhage shows that multiplication of polynomials of degree n (some restrictions on n are needed and discussed later) according to this method has algebraic complexity
M(n) = ( 9 + o(1) )  n logn.     (8)

3.3  Kung's algorithm revisited

Direct substitution of (8) into (7) leads to
L(n) £ ( 18 + o(1) )  n logn.
However, Schönhage obtains a lower multiplicative constant by deferring the last backward FFT. Rn and A2n are first converted into their evaluation forms, requiring two direct N-point FFTs, which cost 2 L(N). Then, steps (5) and (6) compute the evaluation form of RnP, involving two coefficient-wise multiplications and two subtractions, which add 4N units of cost. One ultimate backward N-point FFT interpolates RnP with L(N) operations. Therefore, (7) becomes
L(2n) £ L(n) + 3 L(N) + 4N.

A typical value for N is the lowest power of two that is greater than d = deg(Rn(x) A2n(x)) = 3n. However, a significant overhead is expected when d is slightly greater than the nearest power of two. In this case, the arithmetic complexity for the N-point FFT is L(N) < 3 d log(2d). Thus, Schönhage suggests for N a scaled power of two of the form N = c 2 n, where n = é log(d) ù - ë loglog(d+1) û and c = é d / 2 n ù. This latter choice for N yields a lower bound
L(N) £ d ( 3/2 log(d) + 13/5 loglog(d+1) + O(1) ) .
This precise count yields the arithmetic complexity for reciprocals
L(n) £ ( 27/2 + o(1) ) n logn.

Surprisingly, Newton's third-order iteration does not yield a better bound for arithmetic complexity, as opposed to the case of nonscalar complexity (see Section 3.1).

4  A New Algorithm over Z

Algorithms for division of polynomials reduce the division task to multiplications. However, while featuring an attractive asymptotic complexity, such reductions may involve detours and tricks whose implementations lead to tremendous multiplicative constants. Indeed, earlier algorithms for division of polynomials shared this drawback. Therefore, Schönhage suggests a new fast algorithm by means of Graeffe's root squaring with a low constant and ready for an immediate implementation due to its extreme simplicity.

4.1  Graeffe's root squaring method

Graeffe's squaring method originates in numerical analysis for solving polynomial equations [1]. This method proceeds from any polynomial A(x) in F[x] to the even polynomial G(x2) = A(x)A(-x).

In F[[x]] the reciprocal of A(x) modulo xn+1 may be written as
1
A(x)
=
A(-x)
A(-x)A(x)
modxn+1.     (9)
In equation (9), the denominator of the right hand-side contains at most n+1 terms, but only half of them are significant when computing modulo xn+1. Therefore, Graeffe's rule reduces the task of inverting n+1 terms to a half-sized problem. Thus, the corresponding algorithm works recursively as follows (notations are those of (2) and (3)). With k = ë n / 2 û, Graeffe's step computes
Gk(x2) = An(x)An(-x) modxn+1,
charging at most n+1 nonscalar units of cost. Indeed, typically, nonscalar complexity for such a multiplication is C(n) = 2n+1 (see [2, Chap. 2]). However, the polynomial An may be rewritten as
An(x) = An(even)(x2) + x An(odd)(x2),
which shows that both An(x0) and An(-x0), for any x0 lying in the ground field, can be computed together as follows
Anx0) = An(even)(x02) ± x0 An(odd)(x02).
Therefore, Graeffe's step requires at most n+1 essential multiplications, by evaluation of An for n+1 distinct squares. The reciprocal of Gk(x) modulo xk+1, denoted by Hk(x), is determined by recursive calls. An ultimate multiplication
Rn(x) = An(-x)Hk(x2) modxn+1
delivers the result, charging extra n+2k+1 units of nonscalar cost. Then, the nonscalar complexity is bounded by C(n) £ 6n + 2 log(n/2), which is slightly weaker than Kalorkoti's (see Section 3.1) but the implementation of Graeffe's approach is straightforward.

4.2  Application to reciprocals over Z

This section deals with units of the ring Z[[x]] of the form A(x) = 1 + åi>0 ai xi. This form naturally arises with divisions by monic polynomials computed via the substitution x |® 1/x.

Basically, the implementation of Graeffe's method consists in mapping polynomials to integers expressed in some radix r0 notation, so that multiplication of integers can be used. This idea is based on Kronecker's trick of encoding polynomials with bounded coefficients in a single integer. Let fr0 be a ring morphism from Zn[x] (i.e., polynomials of Z[x] of degree less than n) to Z that evaluates polynomials at r0 Î N. If there exists a constant b such that | ai | < bi holds for each i>0, then the bit size of the coefficients of R and G can be bounded. Thus, under this assumption, r0 Î N can be chosen such that the evaluation map fr0 is a bijection and N can be optimally determined. The arithmetic complexity can easily be derived
L(n) = 6 M(t n2),
where t = log(3 b) and where the Schönhage--Strassen algorithm for multiplication of integers, which features the lowest known complexity M(m) = O(m log(m) loglog(m)) [7], is likely to be used.

References

[1]
Bareiss (Erwin H.). -- Resultant procedure and the mechanization of the Graeffe process. Journal of the ACM, vol. 7, 1960, pp. 346--386.

[2]
Bürgisser (Peter), Clausen (Michael), and Shokrollahi (M. Amin). -- Algebraic complexity theory. -- Springer-Verlag, Berlin, 1997, xxiv+618p. With the collaboration of Thomas Lickteig.

[3]
Geddes (K. O.), Czapor (S. R.), and Labahn (G.). -- Algorithms for computer algebra. -- Kluwer Academic Publishers, Boston, MA, 1992, xxii+585p.

[4]
Kalorkoti (K.). -- Inverting polynomials and formal power series. SIAM Journal on Computing, vol. 22, n°3, 1993, pp. 552--559.

[5]
Kung (H. T.). -- On computing reciprocals of power series. Numerische Mathematik, vol. 22, 1974, pp. 341--348.

[6]
Salvy (Bruno). -- Asymptotique automatique. -- Research Report n°3707, Institut National de Recherche en Informatique et en Automatique, 1999. 20 pages.

[7]
Schönhage (A.) and Strassen (V.). -- Schnelle Multiplikation grosser Zahlen. Computing (Arch. Elektron. Rechnen), vol. 7, 1971, pp. 281--292.

[8]
van der Waerden (B. L.). -- Modern algebra. -- Frederick Ungar Publishing Co., New York, N. Y., 1949, vol. I, xii+264p.

1
For notational convenience, arithmetic complexity is also denoted by M (resp. L) for multiplication (resp. fast Fourier transform).
2
Third-order iteration is mentioned later and consists of a parabolic approximation.

This document was translated from LATEX by HEVEA.