Thirty Years of Integer Factorization

Franois Morain

Lix, cole polytechnique (France)

Algorithms Seminar

February 5, 2001

[summary by Marianne Durand]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).
Abstract
Factoring integers is quite an old challenge. Thirty years ago, two researchers factored the mythic number F7 = 227+1. A few years later public-key cryptography was born, and with it the famous RSA algorithm. Even if the security of RSA is not equivalent to integer factorization, factoring the RSA key is the simplest way to decode everything, so a lot of people tried to factor. In 1990, F9 = 229+1, the ninth Fermat number was factored, with the help of hundreds of computers. In august 1999, it was the turn of the first ordinary 512-bit integer. What follows is a survey of thirty years of factorization, describing the different methods used and the technical problems met.



1  Introduction

Factoring is of great interest since it allows to use the properties of prime number in arithmetic. It is the keystone of the RSA algorithm, the mostly used encryption algorithm. RSA is an asymmetric public key algorithm that is based on the fact that the product of two very large prime numbers can not be easily factored, whereas to check if a number is prime can be done quickly. The complexity class of testing the primality of an integer is NPco-NP. Factoring a number is in NP, but can be done in polynomial time on a quantum computer!

Method Complexity
sieve p
r (p)1/2
elliptic curve method Lp[1,1/2]
quadratic sieve (QS) LN[1/2,c]
number field sieve (NFS) LN[1/3,c]



Table 1: Complexity of factorization methods (N is the integer to be factored, p its smallest factor)


A lot of different methods exist to factor a number, starting from the linear sieve up to the algebraic sieve, including methods based on elliptic curves. Their complexity can be expressed in terms of the function
Lx[a,c]=eclogax(loglogx)1-a.
Some complexities are given in Table 1. The smallest factor p of N is usually of order (N)1/2. The letter c stands for a constant and is not specified as it depends on the algorithm and its implementation. These methods are detailed in the next section.

2  Combination of Congruences

The method of combination of congruences is an extension of Kraitchik's method. The latter aims at finding an integer x such that x2 1mod N and x 1mod N, then at testing if pgcd(x-1,N) is non-trivial. If so, it is a factor of N. The quadratic congruence approach refines the way the square root of 1 is found. The first step consists in finding pairs of integers (ui,vi)i I such that ui2 vimod N and ui2 vi. The second step is to find a subset J I such that j Jvj is a square, noted VJ2. This step is detailed later. If we note j Juj=UJ then step 2 implies UJ2 VJ2 mod N. As we also assume that VJ and N are together prime (otherwise we have a factor of N) then x=UJ/VJ mod N is well defined and is a square root of 1. There is a probability greater than 1/2 that it gives a non trivial factorization of N. This extension is interesting because in order to find the pairs (ui,vi), we can use an algorithm that eventually rejects or ignore some valid pairs, to go faster. One solution for this is Dixon's method. The idea is to restrict the search to integers vi that can be factored on a small set of given small prime integers Pk=(p1,...,pk). To find pairs (ui,vi) according to Dixon's method, we choose an integer ui, and try to factor ui2 on the set Pk. If we succeed, then we keep the pair (ui,ui2). The integer ui has to be greater than (N)1/2, so as to give a non-trivial pair.

Once the pairs (ui,vi) are found, the second step is to find a subspace J such that j Jvj is a square. As the factorization of each vi is already known, this can be seen as a linear algebra problem. Assume that there are k+1 valid pairs available. Consider the matrix M of size (k,k+1) with coefficients 0 and 1 viewed in the field Z/2 Z and such that M[i,j] is equal to the exponent of pi in the factorization of vj. This matrix has a rank smaller than k, so there exists a linear combination of the colums equals to 0. The subset J corresponds to the non-zero coefficients in the linear combination, and we can check that j Jvj is a square, because all its factors are of even degree. To exhibit a concrete linear combination equal to zero is made easier by the sparsity of the matrix M. As a matter of fact, the techniques of Wiedemann or of Lanczos have complexity O(k2+e) on sparse matrices, whereas the Gauss pivot has complexity O(k3). Then we have the expression of VJ easily, and a square root of 1 that may give a factorization of N. This algorithm has a complexity LN[1/2,c], where c is a constant that depends on the algorithm.

3  Sieves

A sieve algorithm searches a lot of candidates satisfying a certain property. Then it makes some tests systematically on all candidates, and at the end keeps the ones that have passed all the tests successfully. One of the first sieves concerning primality and factorization is the Erastothene sieve. The sieve technique is useful in factorization for the search of the set of pairs (u,v) such that u2 vmodN.

The basic quadratic sieve, found by Pomerance in 1981 is an extension of the combination of congruence, with a specific choice algorithm for the pairs (ui,vi). The idea is to choose ui=i+ (N)1/2 , which implies
vi= ( i+ \lfloor (N)1/2 \rfloor ) 2-N.     (1)
The advantage is that vi is close to 2i(N)1/2, and thus vi N, this increases the probability that the prime factors of vi are small. To check that these factors are in the prime number basis Pk we use a sieve algorithm. This sieve algorithm can be described as follows. First fill an array S such that S[i]=vi for i from 1 to a bound L, then for every p in the prime number basis Pk, for the two roots of the equation (i+ (N)1/2 )2 N mod p noted i(p), do i i(p), and while i<L do S[i] S[i]/p and i i+p. This algorithm is justified by the equivalence p|vi  (i+ (N)1/2 )2 N mod p. Then at the end of the loops, for every i such that S[i]=1, vi is factored on Pk. The complexity of this algorithm is LN[1/2,3/(8)1/2], and the cost in memory space is LN[1/2,1/(8)1/2]. The algorithm can be optimized in many ways, for example the large prime or double large prime variation that we are going to detail in the next paragraph.

The large prime variation owes its name to the use of large primes, not in the prime factor basis, and smaller than the square of the largest prime in the basis Pk. The sieving stage of the algorithm can easily be modified to find new relations vi=q pap, where q is a large prime. Now we can combine two relations using the same large prime q, namely v1=q pap and v2=qpbp, and see that v1v2/q2 is factored on Pk. This large prime technique allows us to search for more ``good'' pairs (ui,vi) and so to get more candidates to factor N. In practice it means a speed-up by a factor of approximatly 2.5 [5]. The double large prime variation is quite similar, the difference is that two large primes are allowed in the factorization of the integers vi. For example if v1=q1q2pj*, v2=q2q3pj*, and v3=q1q3pj* (p* stands for any power of p), then v1v2v3/(q1q2q3)2 is factored on the prime basis. The choice of vi, vj and vk such that their product can be factored upon the prime basis Pk modulo squares of large primes can be modelled by a graph problem. Let G be the graph with vertex qi and multiple edges qi,qj labelled by the multiples vk of qiqj. A useful relation corresponds to a cycle in the graph G. This technique was used for the sieving step of a 138-digit number in 1990, as the non-optimized sieve was too big to be handled [5] (see also [4]).

The algebraic sieve [2] or number field sieve (NFS) algorithm is based on the factorization in a number field. Given a polynomial P Z[X] irreducible over Q, we will work in the number field Q[X]/(P(X))=Q(q) where q is a root of P. In the ring Z[q] we can talk about the primality or the prime decomposition of an element, and the norm of the number a-bq is (a-bqi) where qi are all the roots of the polynomial P. In particular the norm does not depend on the particular choice of q. The description of the algorithm requires the following notation. First let m be an integer such that P(m) 0 mod N, then consider the ring homomorphism f that maps Z[q] onto Z/NZ and that satisfies f(q)=m. We are now looking for a set A of pairs (a,b) such that A(a-bq)=(A-Bq)2 and A(a-bm)=Z2. These properties give f((A-Bq)2) (A-Bm)2 Z2 mod N. Then (A-Bm)/Z is a square root of 1, that provides a candidate to factor N. The choice of the polynomial P plays a large part in the efficiency of the algorithm [6]. If the degree of P is O((logN)1/3(loglogN)2/3) then the complexity is LN[1/3,c], where c is a constant.

The way the factorization is done in Z[q] needs to be explained as it is a non trivial part of the algorithm. The idea is to factor first the norm of a-bq, Norm(a-bq)=pap(a,b). This helps because the factorization of a-bq follows the factorization of its norm. If p is a factor of N(a-bq), and p does not divide b (this being a pathological case), then there exists an integer r such that a-br 0mod p and P(r) 0 mod p. We denote by [p,r] the ideal of Z[q] such that any element x-yq of [p,r] satisfies Norm(x-q y) 0mod p and x-yr 0mod p. This family of ideals is very interesting because (a-bq)=[p,r]ap(a,b), where (a-bq) is the ideal generated by a-bq.

Now that we know how to factor a number in Z[q], we apply the sieve algorithm over the pairs (a,b). The factorization algorithm can be optimized by a good choice of the polynomial P [1]. The variant SNFS, Special Number Field Sieve, targets the numbers bn 1 by the choice of P. The general NFS algorithm becomes better than the quadratic sieve with large primes optimizations for numbers of size around 130 digits.

4  Records and Conclusion

Figure 1 shows the evolution of the factorization records. For each specific algorithm, the progress follows Moore's law that states that the speed of computers double every 18 months. Then for each change of algorithm, there is a jump. Remark that the SNFS algorithm factors specific numbers, that are thus larger than for GNFS that factors general numbers [3]. The linear algebra is often the limiting factor, and unless there is a new idea on the subject, RSA can still be used for some times if used with a key big enough.

References

[1]
Bernstein (Daniel J.) and Lenstra (A. K.). -- A general number field sieve implementation. In Lenstra (A.) and Lenstra (H.) (editors), The development of the number field sieve, pp. 103--126. -- Springer, Berlin, 1993.

[2]
Buhler (J. P.), Lenstra, Jr. (H. W.), and Pomerance (Carl). -- Factoring integers with the number field sieve. In Lenstra (A.) and Lenstra (H.) (editors), The development of the number field sieve, pp. 50--94. -- Springer, Berlin, 1993.

[3]
Cavallar (Stefania), Dodson (Bruce), Lenstra (Arjen K.), Lioen (Walter M.), Montgomery (Peter L.), Murphy (Brian), te Riele (Herman), Aardal (Karen), Gilchrist (Jeff), Guillerm (Gerard), Leyland (Paul C.), Marchand (Jol), Morain (Franois), Muffett (Alec), Putnam (Chris), Putnam (Craig), and Zimmermann (Paul). -- Factorization of a 512-bit RSA modulus. In Preneel (B.) (editor), Advances in cryptology---EUROCRYPT'00 (Bruges, 2000), pp. 1--18. -- Springer, Berlin, 2000.

[4]
Lenstra (A. K.) and Manasse (M. S.). -- Factoring with two large primes. Mathematics of Computation, vol. 63, n°208, 1994, pp. 785--798.

[5]
Lenstra (Arjen K.) and Manasse (Mark S.). -- Factoring with two large primes (extended abstract). In Damg rd (I. B.) (editor), Advances in cryptology---EUROCRYPT '90 (Aarhus, 1990), pp. 72--82. -- Springer, Berlin, 1991.

[6]
Murphy (Brian). -- Modelling the yield of number field sieve polynomials. In Buhler (J. P.) (editor), Algorithmic number theory (Portland, OR, 1998), pp. 137--150. -- Springer, Berlin, 1998. Proceedings of the Third International Symposium ANTS-III.

This document was translated from LATEX by HEVEA.