Thirty Years of Integer Factorization
, École polytechnique (France)
February 5, 2001
[summary by Marianne Durand]
A properly typeset version of this document is available in postscript
and in pdf
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
Factoring integers is quite an old challenge. Thirty years ago, two
researchers factored the mythic number F7 = 227+1. A few years
later public-key cryptography was born, and with it the famous RSA
algorithm. Even if the security of RSA is not equivalent to integer
factorization, factoring the RSA key is the simplest way to decode
everything, so a lot of people tried to factor. In 1990, F9 =
229+1, the ninth Fermat number was factored, with the help of
hundreds of computers. In august 1999, it was the turn of the first
ordinary 512-bit integer. What follows is a survey of thirty years of
factorization, describing the different methods used and the technical
Factoring is of great interest since it allows to use the properties
of prime number in arithmetic. It is the keystone of the RSA
algorithm, the mostly used encryption algorithm. RSA is an asymmetric
public key algorithm that is based on the fact that the product of two
very large prime numbers can not be easily factored, whereas to check
if a number is prime can be done quickly. The complexity class of
testing the primality of an integer is
NPÇco-NP. Factoring a number is in
NP, but can be done in polynomial time on a quantum
A lot of different methods exist to factor a number, starting from the
linear sieve up to the algebraic sieve, including methods based on
elliptic curves. Their complexity can be expressed in terms of the
|elliptic curve method
|quadratic sieve (QS)
|number field sieve (NFS)
Table 1: Complexity of factorization methods (N is the integer to be
factored, p its smallest factor)
Some complexities are given in Table 1. The smallest factor
p of N is usually of order (N)1/2. The letter c stands for a
constant and is not specified as it depends on the algorithm and its
implementation. These methods are detailed in the next section.
2 Combination of Congruences
The method of combination of congruences is an extension of Kraitchik's method.
The latter aims at finding an integer x such that
x2º 1mod N and
x¹ ± 1mod N, then at testing if
pgcd(x-1,N) is non-trivial. If so, it is a factor of
N. The quadratic congruence approach
refines the way the square root of 1 is found.
The first step consists in finding pairs of integers
(ui,vi)iÎ I such
that ui2º vimod N and
ui2¹ ± vi. The second step is to find a subset JÌ I
such that ÕjÎ Jvj is a square, noted VJ2. This
detailed later. If we note
ÕjÎ Juj=UJ then step 2 implies
UJ2 º VJ2 mod N. As we also assume that
VJ and N are together prime (otherwise we have a factor of N)
then x=UJ/VJ mod N is well defined and is a square
root of 1. There is a probability greater than 1/2 that it gives a
non trivial factorization of N.
This extension is interesting because in order to find the pairs
(ui,vi), we can use an algorithm that eventually rejects or ignore
some valid pairs, to go faster. One solution for this is Dixon's method.
The idea is to restrict the search to integers vi that can be
factored on a
small set of given small prime integers Pk=(p1,...,pk). To
find pairs (ui,vi) according to Dixon's method, we choose an
integer ui, and try to factor ui2 on the set Pk. If we
succeed, then we keep the pair (ui,ui2). The integer ui has
to be greater than (N)1/2, so as to give a non-trivial pair.
Once the pairs (ui,vi) are found, the second step is to find a
subspace J such that ÕjÎ Jvj is a square. As the
factorization of each vi is already known, this can be seen as a
linear algebra problem. Assume that there are k+1 valid pairs
available. Consider the matrix M of size (k,k+1) with coefficients
0 and 1 viewed in the field Z/2 Z and such that
M[i,j] is equal to the exponent of pi in the factorization of
vj. This matrix has a rank smaller than k, so there exists a
linear combination of the colums equals to 0. The subset J
corresponds to the non-zero coefficients in the linear combination,
and we can check that ÕjÎ Jvj is a square, because all its
factors are of even degree. To exhibit a concrete linear combination
equal to zero is made easier by the sparsity of the matrix M. As a
matter of fact, the techniques of Wiedemann or of Lanczos have
complexity O(k2+e) on sparse matrices, whereas the Gauss
pivot has complexity O(k3). Then we have the expression of VJ
easily, and a square root of 1 that may give a factorization of
N. This algorithm has a complexity LN[1/2,c], where c is a
constant that depends on the algorithm.
A sieve algorithm searches a lot of candidates satisfying a certain
property. Then it makes some tests systematically on all candidates,
and at the end keeps the ones that have passed all the tests
successfully. One of the first sieves concerning primality and
factorization is the Erastothene sieve. The sieve technique is useful
in factorization for the search of the set of pairs (u,v) such that
The basic quadratic sieve, found by Pomerance in 1981 is an extension
of the combination of congruence, with a specific choice algorithm for
(ui,vi). The idea is to choose ui=i+ë (N)1/2 û ,
The advantage is that
vi is close to 2i(N)1/2, and thus vi « N, this increases the
probability that the prime factors of vi are small. To
check that these factors are in the prime number basis Pk we use
a sieve algorithm. This
sieve algorithm can be described as follows. First fill an array S
that S[i]=vi for i from 1 to a bound L, then for every p in
the prime number basis Pk, for the two roots of
(i+ë (N)1/2 û)2º N mod p noted
and while i<L do S[i]¬ S[i]/p and i¬ i+p.
This algorithm is justified by the equivalence
p|vi ÜÞ (i+ë (N)1/2 û)2º N mod p.
at the end of the loops, for every i such that S[i]=1, vi is
factored on Pk. The complexity of this
is LN[1/2,3/(8)1/2], and the cost in memory space is
LN[1/2,1/(8)1/2]. The algorithm can be optimized in many ways,
for example the large prime or double large prime variation that we
are going to detail in the next paragraph.
The large prime variation owes its name to the use of large primes,
not in the prime factor basis, and smaller than the square of the
largest prime in the basis Pk. The sieving stage of the algorithm
can easily be modified to find new relations vi=qÕ
pap, where q is a large prime. Now we can combine two
relations using the same large prime q, namely v1=qÕ
pap and v2=qÕpbp, and see that v1v2/q2
is factored on Pk. This large prime technique allows us to search
for more ``good'' pairs (ui,vi) and so to get more candidates to
factor N. In practice it means a speed-up by a factor of
approximatly 2.5 . The double large prime variation is
quite similar, the difference is that two large primes are allowed in
the factorization of the integers vi. For example if
v1=q1q2Õpj*, v2=q2q3Õpj*, and
v3=q1q3Õpj* (p* stands for any power of p), then
v1v2v3/(q1q2q3)2 is factored on the prime basis. The choice
of vi, vj and vk such that their product can be factored
upon the prime basis Pk modulo squares of large primes can be
modelled by a graph problem. Let G be the graph with vertex qi
and multiple edges qi,qj labelled by the multiples vk of
qiqj. A useful relation corresponds to a cycle in the graph G.
This technique was used for the sieving step of a 138-digit number in
1990, as the non-optimized sieve was too big to be
handled  (see also ).
The algebraic sieve  or number field sieve (NFS)
algorithm is based on the factorization in a number field. Given a
polynomial PÎ Z[X] irreducible over Q, we will
work in the number
field Q[X]/(P(X))=Q(q) where q is a root
of P. In the ring Z[q] we can talk about the
primality or the
prime decomposition of an element, and the norm of the number
a-bq is Õ(a-bqi) where qi are all the
roots of the polynomial P. In particular the norm does not depend on
the particular choice of q.
The description of the algorithm requires the following
notation. First let m be an integer
P(m)º 0 mod N, then consider the ring
homomorphism f that maps Z[q] onto
Z/NZ and that satisfies f(q)=m. We are now
looking for a set A of pairs (a,b) such that
ÕA(a-bm)=Z2. These properties give
f((A-Bq)2)º (A-Bm)2º Z2 mod
N. Then (A-Bm)/Z is a square root of 1, that provides a candidate
to factor N.
The choice of the polynomial P plays a large part in the efficiency of
the algorithm . If the degree of P is
O((logN)1/3(loglogN)2/3) then the complexity is
LN[1/3,c], where c is a constant.
The way the factorization is done in Z[q] needs to be
explained as it is a non trivial part of the algorithm. The idea is to
factor first the norm of a-bq,
helps because the factorization of a-bq follows the
factorization of its norm.
If p is a factor of N(a-bq), and p does not divide b
(this being a pathological case), then there exists an integer r
such that a-brº 0mod p and
P(r)º 0 mod p. We denote by [p,r] the ideal
of Z[q] such that any element x-yq of [p,r]
satisfies Norm(x-q y)º 0mod p
and x-yrº 0mod p.
This family of ideals is very
interesting because (a-bq)=Õ[p,r]ap(a,b), where
(a-bq) is the ideal generated by a-bq.
Now that we know how to factor a number in Z[q], we
apply the sieve algorithm over the pairs (a,b). The factorization
algorithm can be optimized by a good choice of the polynomial
P . The
variant SNFS, Special Number Field Sieve, targets the numbers
bn± 1 by the choice of P. The general NFS algorithm becomes
better than the quadratic sieve with large primes optimizations for
numbers of size around 130 digits.
4 Records and Conclusion
Figure 1 shows the evolution of the factorization
records. For each
specific algorithm, the progress follows Moore's law that states that
the speed of computers double every 18 months. Then for each change of
algorithm, there is a jump. Remark that the SNFS algorithm factors
specific numbers, that are thus larger than for GNFS that factors
general numbers .
The linear algebra is often the limiting factor, and unless there is a
new idea on the subject, RSA can still be used for some times if used
key big enough.
Bernstein (Daniel J.) and Lenstra (A. K.). --
A general number field sieve implementation. In Lenstra (A.) and
Lenstra (H.) (editors), The development of the number field sieve,
pp. 103--126. --
Springer, Berlin, 1993.
Buhler (J. P.), Lenstra, Jr. (H. W.), and Pomerance (Carl). --
Factoring integers with the number field sieve. In Lenstra (A.) and
Lenstra (H.) (editors), The development of the number field sieve,
pp. 50--94. --
Springer, Berlin, 1993.
Cavallar (Stefania), Dodson (Bruce), Lenstra (Arjen K.), Lioen (Walter M.),
Montgomery (Peter L.), Murphy (Brian), te Riele (Herman), Aardal (Karen),
Gilchrist (Jeff), Guillerm (Gerard), Leyland (Paul C.), Marchand (Joël),
Morain (François), Muffett (Alec), Putnam (Chris), Putnam (Craig), and
Zimmermann (Paul). --
Factorization of a 512-bit RSA modulus. In Preneel (B.) (editor),
Advances in cryptology---EUROCRYPT'00 (Bruges, 2000), pp. 1--18. --
Springer, Berlin, 2000.
Lenstra (A. K.) and Manasse (M. S.). --
Factoring with two large primes. Mathematics of Computation,
vol. 63, n°208, 1994, pp. 785--798.
Lenstra (Arjen K.) and Manasse (Mark S.). --
Factoring with two large primes (extended abstract). In Damgå rd
(I. B.) (editor), Advances in cryptology---EUROCRYPT '90 (Aarhus,
1990), pp. 72--82. --
Springer, Berlin, 1991.
Murphy (Brian). --
Modelling the yield of number field sieve polynomials. In Buhler
(J. P.) (editor), Algorithmic number theory (Portland, OR, 1998),
pp. 137--150. --
Springer, Berlin, 1998. Proceedings of the Third
International Symposium ANTS-III.
This document was translated from LATEX by