pollard.html

POLLARD'S RHO ALGORITHM

Bruno Salvy

(Version of January 27, 1997)

Pollard's -method is an efficient technique used to find factors of integers. It is both very efficient and very simple. We show in this worksheet how combstruct and gfun can be used to analyze a realistic combinatorial model of the algorithm and thus derive a probabilistic complexity analysis of this algorithm and variants of it.

> with(combstruct): with(gfun):

Algorithm and combinatorial model

The algorithm

If is the integer of which a factor is sought, the basic version of the algorithm is as follows:

Pick up an arbitrary integer mod , set

Select at random a number mod , set

Iterate:

until .

This is directly translated into the following rough Maple procedure which returns a factor and the number of iterations performed to find this factor:

> pollard:=proc(N)
local rnd, a, f, x, y, i, g;
rnd:=rand(N); a:=rnd(); x:=rnd(); y:=x;
for i do
x:=x^2+a mod N; y:=(y^2+a mod N)^2+a mod N; g:=igcd(y-x,N);
if g<>1 then RETURN(g,i) fi
od
end:

Here are a few examples (Maple's nextprime routine returns the smallest prime number following its argument).

> nextprime(900)*nextprime(20000);

> pollard(");

> pollard("");

> pollard(""");

> nextprime(10^5)*nextprime(10^6);

> pollard(");

> pollard("");

> pollard(""");

> nextprime(10^6)*nextprime(10^7);

> pollard(");

> pollard("");

> pollard(""");

The combinatorial model

The algorithm relies on the structure of the functional graph of the function , where is the smallest prime factor of . In this graph, the vertices are the integers mod and the directed edges link each vertex to its image by . Since the number of points is finite, it is not difficult to see that the graph has the structure of a union of connected components, each of these components being constituted of a cycle to which converge trees. Since the polynomial has degree 2 and is prime, all the vertices (except ) have in-degree 0 or 2, while they have out-degree 1. The prime factor is assumed to be large, therefore the special case of the vertex which has in-degree 1 can be discarded as a first approximation. The combinatorial model is thus expressed by the following combstruct grammar :

> G:={fungraph=Set(connected_component),
connected_component=Cycle(Prod(Z,bintree)),
bintree=Union(Z,Prod(Z,Set(bintree,card=2)))}:

The execution of the algorithm is interpreted on this graph as follows: a random point of the graph is selected. Then two sequences and of iterates with initial value are computed, one of them proceeding one step at a time, while the other one proceeds by steps of length 2. Starting from a node of the graph, these two sequences eventually reach a cycle, where they are bound to intersect. This is where the algorithm stops.
Under this model, the average number of steps required by the algorithm is therefore related to two parameters: (i) the average distance from a point to its cycle; (ii) the average length of the cycles.

Path length in planar binary trees

We start with question (i) above: the determination of the average distance from a point to its cycle. We first concentrate on a similar but simpler problem, the computation of the average distance from a node to the root in a planar binary tree. This is related to a classical combinatorial parameter: the internal path length of the tree, which is the sum of the distances from each of the nodes to the root.
Binary trees are described by the following grammar:

> bin:={bintree=Union(Epsilon,Prod(Z,bintree,bintree))}:

The counting sequence given by combstruct[count] is constituted by the well-known Catalan numbers:

> seq(count([bintree,bin,unlabelled],size=i),i=0..15);

The following Maple procedure takes as input a binary tree as produced by combstruct using the specification above and returns its internal path length:

> ipl:=proc(t)
if type(t,name) then 0 # the tree is of the form Z or Epsilon
else # the tree is of the form Prod(Z,t1,t2)
ipl(op(2,t))+ipl(op(3,t))+size(t)-1
fi
end:
size:=proc(t) eval(subs([Prod=`+`,Z=1,Epsilon=0],t)) end:

The internal path length is computed using a simple bijection: each of the nodes on a branch from the root to a leaf are counted once for each of the nodes below it. Here is an example on a tree of size 5 generated by combstruct[draw] from the specification above:

> T:=draw([bintree,bin,unlabelled],size=5); ipl(T);

We shall compute the average internal path length by first computing the total internal path length, i.e. the sum of the internal path lengths of all the binary trees of size and then dividing this number by the th Catalan number. We first start by computing experimentally the first values of these numbers, generating all the trees of fixed sizes for small sizes with combstruct[allstructs] , and computing the sum of the internal path lengths of all these trees with the procedure ipl above.

> for i to 5 do sum_ipl[i]:=`+`(op(map(ipl,allstructs([bintree,bin,unlabelled],size=i)))) od;

Combinatorial model for path lengths

The numbers computed above can actually be derived more efficiently, together with many other results related to path lengths using combstruct's ability to deal with marks (atoms of size 0). The idea consists in writing a combinatorial specification such that the number of objects of size is exactly the sum of the internal path lengths of all the binary trees of size . The combinatorial technique used here extends to other combinatorial structures and leads to a combstruct-based analysis of Pollard's -algorithm.
The grammar for binary trees is modified to take into account "decorated" binary trees. A binary tree is decorated by putting two marks (An for Ancestor and De for Descendant) on two nodes belonging to the same branch. The number of ways of decorating a binary tree is then exactly its internal path length. Couting the number of decorated trees therefore corresponds to summing the internal path lengths of all binary trees.

> bin2:={bintree2=Prod(Z,
Union(leftright2,Prod(An,leftright1))),
bintree1=Prod(Z,
Union(leftright1,Prod(De,bintree,bintree))),
leftright2=Union(Prod(bintree2,bintree),
Prod(bintree,bintree2)),
leftright1=Union(Prod(bintree1,bintree),
Prod(bintree,bintree1)),
An=Epsilon, De=Epsilon} union bin:

The sequence of cumulated internal path lengths is now derived in less than a second:

> seq(count([bintree2,bin2,unlabelled],size=i),i=1..10);

Here are the 14 decorated trees of size 3:

> allstructs([bintree2,bin2,unlabelled],size=3);

Generating functions

From the grammars describing binary trees and decorated binary trees, the average internal path length can be computed via generating functions. Using combstruct[gfsolve] , we first derive the generating functions for the Catalan numbers and for the cumulated path lengths:

> F:=subs(gfsolve(bin,unlabelled,z),bintree(z));

> S:=series(F,z,30);

> Fpl:=subs(gfsolve(bin2,unlabelled,z),bintree2(z));

> Spl:=series(Fpl,z,30);

The average path length is simply the ratio of these coefficients:

> seq(coeff(Spl,z,i)/coeff(S,z,i),i=0..28);

> evalf(["]);

Empirically, these numbers grow slightly faster than linearly with the size of the tree. A closed-form for the average path length can be established rigorously using gfun . The generating functions being algebraic, they satisfy linear differential equations. These can be derived by gfun[holexprtodiffeq] . From these differential equations, a linear recurrence satisfied by the Taylor coefficients is obtained by gfun[diffeqtorec] . It turns out that these recurrences fall into a class for which Maple's rsolve can find a solution.
Here are the differential equations:

> deqF:=holexprtodiffeq(F,y(z));

> deqFpl:=holexprtodiffeq(Fpl,y(z));

From there, the recurrences follow:

> recF:=diffeqtorec(deqF,y(z),u(n));

> recFpl:=diffeqtorec(deqFpl,y(z),u(n));

The recurrence satisfied by the Catalan numbers being linear of order 1, it is obvious that Maple will be able to solve it:

> Cat:=rsolve(recF,u(n));

It is however more surprising that rsolve can compute the solution of the 3rd order recurrence satisfied by the cumulated path lengths. This is due to the implementation of the recent algorithm by M. Petkovsek for hypergeometric solutions of linear recurrences.

> Pl:=rsolve(recFpl,u(n));

After some cleaning up, we have thus proved the following.

Proposition . The average internal path length in a random planar unlabelled binary tree of size under the uniform model is
.

> seq(4^i*(i+1)/binomial(2*i,i)-3*i-1,i=1..10);

From there, the asymptotic behaviour is well within the reach of Maple's asympt command:

> map(combine,asympt(Pl/Cat,n),exp);

Average distance from a point to a cycle

We now come back to the analysis of Pollard's -algorithm. The analysis of the average distance from a node to its cycle proceeds exactly as in the case of binary trees by decorating the corresponding trees. The grammar is derived from the grammar G above:

> G;

The process of decoration consists in isolating the path between ancestor and descendant in the combinatorial structure. For instance, non-planar binary trees are described by

> npbin:={bintree=subs(G,bintree)};

Here, Set indicates that the respective position of the two subtrees does not count. In the labelled case, we can therefore decide that the decorated branch will always be the leftmost one (instead of considering all the paths leftright1 and leftright2 as in the case of planar binary trees). Thus the decorated grammar in this case is

> bin3:={bintree2=Prod(Z,Union(Prod(bintree2,bintree),
Prod(An,bintree1,bintree))),
bintree1=
Prod(Z,Union(Prod(bintree1,bintree),
Prod(De,Set(bintree,card=2)),De)),
An=Epsilon,De=Epsilon} union npbin:

Again, we can check that the first values coincide with the result produced by computing the internal path length on all the non-planar binary trees:

> ipl:=proc(t) if type(t,name) then 0 else size(t)-1+ipl(op([2,1],t))+ipl(op([2,2],t)) fi end:
size:=proc(t) local i;eval(subs([Prod=`+`,Set=`+`,seq(i=1,i=indets(t,name))],t)) end:

> for i to 6 do i,`+`(op(map(ipl,allstructs([bintree,npbin,labelled],size=i)))) od;

> seq(count([bintree2,bin3,labelled],size=i),i=1..11);

The same process readily extends to the functional graphs involved in Pollard's algorithm by a decomposition of sets and cycles which isolates the marked part.

> G1:={fungraph=Prod(connected_component1,
Set(connected_component)),
connected_component1=Prod(Z,
Union(Prod(An,bintree1),bintree2),
Sequence(Prod(Z,bintree))),
connected_component=Cycle(Prod(Z,bintree))}
union bin3:

We first write procedures to compute the distance from nodes to their cycles in the non-decorated graphs so that we can check on the first few values that our grammar for decorated graphs is consistent with the non-decorated one.

> iplfg:=proc(g) `+`(op(map(iplcc,g))) end:
iplcc:=proc(cc) `+`(op(map(iplbt,cc))) end:
iplbt:=proc(bt) if type(bt,name) then 0 else size(op(2,bt))+iplt(op(2,bt)) fi end:
iplt:=proc(t) if type(t,name) then 0 else size(t)-1+iplt(op([2,1],t))+iplt(op([2,2],t)) fi end:
size:=proc(t) local i; eval(subs([Set=`+`,Prod=`+`,seq(i=1,i=indets(t,name))],t)) end:

Here are a few tests:

> to 4 do t:=draw([fungraph,G,labelled],size=6); print(t,iplfg(t)) od:

The number of functional graphs of fixed size grows quite fast:

> seq(count([fungraph,G,labelled],size=i),i=1..10);

We can therefore only check the values for sizes 2 and 4 in a reasonable amount of time

> map(iplfg,allstructs([fungraph,G,labelled],size=2));

> sort(map(iplfg,allstructs([fungraph,G,labelled],size=4)));

> convert(",`+`);

On the other hand, here is the counting sequence for decorated graphs:

> seq(count([fungraph,G1,labelled],size=i),i=1..10);

It takes 11 minutes to check that 10440 is also the value we get by generating all the binary functional graphs of size 6.

We now compute the generating functions with combstruct[gfsolve] :

> F:=subs(gfsolve(G,labelled,z),fungraph(z));

> Fpl:=subs(gfsolve(G1,labelled,z),fungraph(z));

Again, we obtain closed-forms from these generating functions by applying gfun to find the differential equation they satisfy and from there the recurrence satisfied by their coefficients:

> deqF:=holexprtodiffeq(F,y(z));

> deqFpl:=holexprtodiffeq(Fpl,y(z));

> recF:=diffeqtorec(deqF,y(z),u(n));

> recFpl:=diffeqtorec(deqFpl,y(z),u(n));

Unfortunately, due to a weakness in Maple's current version of rsolve, the solutions to these recurrences are not found

> rsolve(recF,u(n)),rsolve(recFpl,u(n));

However, Maple can find the solution if we help it by taking into account the parity of the generating functions:

> deqF2:=holexprtodiffeq(subs(z=z^(1/2),F),y(z)):

> deqFpl2:=holexprtodiffeq(subs(z=z^(1/2),normal(Fpl)),y(z)):

> recF2:=diffeqtorec(deqF2,y(z),u(n));

> recFpl2:=diffeqtorec(deqFpl2,y(z),u(n));

> nb_fg:=subs(n=n/2,rsolve(recF2,u(n)));

> tot_pl:=map(simplify,subs(n=n/2,rsolve(recFpl2,u(n))));

This is summarized by:

Proposition : The average distance from a point to a cycle in a random binary non-planar functional graph of size is given by .

From there a direct asymptotic expansion follows:

> map(simplify,asympt(subs(tot_pl/nb_fg/n),n));

> as_cost_1:=op(1,"):

Thus under our probabilistic model, the first stage of Pollard's -algorithm takes asymptotically steps, where is the smallest prime factor of the number to be factored and .

Average length of the cycles

After both sequences ( and ) have reached the cycle, the number of steps before the end of Pollard's algorithm is bounded by the length of this cycle. Since this length might be correlated to the number of steps before, it is not a priori sufficient to compute the average length of the cycles in a random graph obeying our grammar. Instead, we modify the decorated graphs so that the Ancestor is now an element of the cycle. The number of decorated graphs of size is thus the sum for all the nodes of the length of their cycle, summed over all graphs of size . Here is the new grammar:

> G2:=remove(type,G1,identical(connected_component1)=anything)
union {connected_component1=
Prod(Z,
Union(bintree1,Prod(De,bintree)),
Sequence(Prod(Z,bintree)),
Prod(Z,An,bintree),
Sequence(Prod(Z,bintree)))}:

The corresponding generating function is again algebraic:

> Fplcycle:=factor(normal(subs(gfsolve(G2,labelled,z),fungraph(z))));

> series(Fplcycle,z,11);

Again, a closed form follows for the coefficients:

> holexprtodiffeq(subs(z=z^(1/2),Fplcycle)/z,y(z));

> diffeqtorec(",y(z),u(n));

> rsolve(",u(n));

> T1:=map(simplify,subs(n=n/2-1,")+tot_pl);

We thus get the following conclusion:

Theorem . The average number of steps of Pollard's -algorithm under the probabilistic model is bounded between and ,
where is the smallest prime factor of the integer whose factorization is sought.
Asymptotically, this number of steps therefore behaves like , with in .

Here is the verification of the second part:

> map(simplify,asympt(T1/nb_fg/n,n,3));

> as_cost_2:=op(1,"):

> evalf("");

Since the smallest prime factor of a number is of order , it follows that the (arithmetic) complexity of Pollard's algorithm under this model grows in .

Extension: polynomials of degree

When the factor to be found is known a priori to satisfy for some known in advance, Pollard conjectured that the number of steps in his algorithm could be reduced by a factor by using the polynomial instead of . Brent and Pollard used this idea to factor the 8th Fermat number, whose factors are known to be . We know consider Pollard's under the probabilistic model for .

Here is the grammar for functional graphs with as a parameter:

> Gm:={fg=Set(cc),cc=Cycle(Prod(Z,Set(tree,card=m-1))),
tree=Union(Z,Prod(Z,Set(tree,card=m)))}:

Here is the corresponding decorated grammar for the distances to the cycles:

> Gm2:={fg1=Prod(cc1,Set(cc)),
cc1=Prod(Z,Union(Prod(An,tree1),tree2),
Set(tree,card=m-2),
Sequence(Prod(Z,Set(tree,card=m-1)))),
tree1=Prod(Z,Union(Prod(tree1,Set(tree,card=m-1)),
Prod(De,Set(tree,card=m)),De)),
tree2=Prod(Z,Union(tree2,Prod(An,tree1)),
Set(tree,card=m-1)),
An=Epsilon,De=Epsilon} union Gm:

And the grammar for the distances to the cycles plus the length of the cycle:

> Gm3:={fg2=Prod(cc2,Set(cc)),
cc2=Prod(Z,Union(Prod(tree1,Set(tree,card=m-2)),
Prod(De,Set(tree,card=m-1))),
Sequence(Prod(Z,Set(tree,card=m-1))),
Prod(Z,An,Set(tree,card=m-1)),
Sequence(Prod(Z,Set(tree,card=m-1))))}
union Gm2:

We first check on the case that we recover the generating functions we had before.

> sol:=gfsolve(subs(m=2,Gm3),labelled,z);

> normal(subs(sol,fg(z))-F);

> normal(subs(sol,fg1(z))-Fpl);

> normal(subs(sol,fg2(z))-Fplcycle);

The grammar becomes

> G6:=subs(m=6,Gm3);

From there we compute the generating functions.

> sol:=gfsolve(G6,labelled,z):

Here is the generating function for the functional graphs:

> F:=subs(sol,fg(z));

Again, the coefficients admit a closed-form which can be obtained using gfun (it could also be derived by Lagrange's inversion formula). We rather compute directly the asymptotic expansion of the number of functional graphs of size whose nodes have an in-degree of 0 or 6 by singularity analysis. Unfortunately, Maple's series command is not yet able to deal with algebraic functions like the RootOf above. We therefore start from the equation in the system which is at the origin of the singularity:

> T:=subs(sol,tree(z));

> tree(z)-subs(gfeqns(G6,labelled,z),tree(z));

> eq:=subs(tree(z)=y,Z(z)=z,");

> diff(eq,y);

> solve({"","},{y,z});

It follows that the dominant singularity is at , where . The singular expansion of can be computed at that point by power series reversion (we set where is the singularity, which helps series a little) :

> sing:=5/6*12^(1/3):

> series(subs(z=sing*(1-theta^2),eq),y=12^(1/3));

> {solve(",y)};

> sery:=op(select(proc(s,u) evalb(signum(coeff(s,u,1))=-1) end,",theta));

> series(subs(T=sery,z=sing*(1-theta^2),F),theta);

Hence the number of functional graphs with in-degree in {0,6} is asymptotically:

> as_fg:=coeff(",theta,-1)/sqrt(Pi)*n^(-1/2)*sing^(-n);

We next consider the distance to the cycle

> Fpl:=subs(sol,fg1(z));

> series(subs(T=sery,z=sing*(1-theta^2),Fpl),theta);

And the length of the cycle:

> Fpl2:=subs(sol,fg2(z));

> series(subs(T=sery,z=sing*(1-theta^2),Fpl2),theta);

Thus the distance to the cycle and the length of this cycle are both asymptotic to:

> asympt(coeff(",theta,-4)*n*sing^(-n)/as_fg/n,n);

Thus the ratio between this variant of Pollard's algorithm and the original one is in both cases (number of steps to the cycle and length of the cycle):

> ratio:=simplify(map(asympt,[as_cost_1/",as_cost_2/"/2],n));

Conclusion

This worksheet shows that some algorithms whose complexity analysis does not reduce to counting the number of subcomponents in some kind of recursive combinatorial structure can sometimes be treated by combstruct using several marks. This is in particular true of many recursive searching algorithms whose complexity is related to the path length of an underlying structure.