POLLARD'S RHO ALGORITHM

Bruno Salvy

(Version of January 27, 1997)

Pollard's [Maple Math] -method is an efficient technique used to find factors of integers. It is both very efficient and very simple. We show in this worksheet how combstruct and gfun can be used to analyze a realistic combinatorial model of the algorithm and thus derive a probabilistic complexity analysis of this algorithm and variants of it.

> with(combstruct): with(gfun):

Algorithm and combinatorial model

The algorithm

If [Maple Math] is the integer of which a factor is sought, the basic version of the algorithm is as follows:

Pick up an arbitrary integer [Maple Math] mod [Maple Math] , set [Maple Math]

Select at random a number [Maple Math] mod [Maple Math] , set [Maple Math]

Iterate:

[Maple Math]

until [Maple Math] .

This is directly translated into the following rough Maple procedure which returns a factor and the number of iterations performed to find this factor:

> pollard:=proc(N)
local rnd, a, f, x, y, i, g;
rnd:=rand(N); a:=rnd(); x:=rnd(); y:=x;
for i do
x:=x^2+a mod N; y:=(y^2+a mod N)^2+a mod N; g:=igcd(y-x,N);
if g<>1 then RETURN(g,i) fi
od
end:

Here are a few examples (Maple's nextprime routine returns the smallest prime number following its argument).

> nextprime(900)*nextprime(20000);

[Maple Math]

> pollard(");

[Maple Math]

> pollard("");

[Maple Math]

> pollard(""");

[Maple Math]

> nextprime(10^5)*nextprime(10^6);

[Maple Math]

> pollard(");

[Maple Math]

> pollard("");

[Maple Math]

> pollard(""");

[Maple Math]

> nextprime(10^6)*nextprime(10^7);

[Maple Math]

> pollard(");

[Maple Math]

> pollard("");

[Maple Math]

> pollard(""");

[Maple Math]

The combinatorial model

The algorithm relies on the structure of the functional graph of the function [Maple Math] , where [Maple Math] is the smallest prime factor of [Maple Math] . In this graph, the vertices are the integers [Maple Math] mod [Maple Math] and the directed edges link each vertex to its image by [Maple Math] . Since the number of points is finite, it is not difficult to see that the graph has the structure of a union of connected components, each of these components being constituted of a cycle to which converge trees. Since the polynomial [Maple Math] has degree 2 and [Maple Math] is prime, all the vertices (except [Maple Math] ) have in-degree 0 or 2, while they have out-degree 1. The prime factor [Maple Math] is assumed to be large, therefore the special case of the vertex [Maple Math] which has in-degree 1 can be discarded as a first approximation. The combinatorial model is thus expressed by the following combstruct grammar :

> G:={fungraph=Set(connected_component),
connected_component=Cycle(Prod(Z,bintree)),
bintree=Union(Z,Prod(Z,Set(bintree,card=2)))}:

The execution of the algorithm is interpreted on this graph as follows: a random point [Maple Math] of the graph is selected. Then two sequences [Maple Math] and [Maple Math] of iterates with initial value [Maple Math] are computed, one of them proceeding one step at a time, while the other one proceeds by steps of length 2. Starting from a node of the graph, these two sequences eventually reach a cycle, where they are bound to intersect. This is where the algorithm stops.
Under this model, the average number of steps required by the algorithm is therefore related to two parameters:
(i) the average distance from a point to its cycle; (ii) the average length of the cycles.

Path length in planar binary trees

We start with question (i) above: the determination of the average distance from a point to its cycle. We first concentrate on a similar but simpler problem, the computation of the average distance from a node to the root in a planar binary tree. This is related to a classical combinatorial parameter: the internal path length of the tree, which is the sum of the distances from each of the nodes to the root.
Binary trees are described by the following grammar:

> bin:={bintree=Union(Epsilon,Prod(Z,bintree,bintree))}:

The counting sequence given by combstruct[count] is constituted by the well-known Catalan numbers:

> seq(count([bintree,bin,unlabelled],size=i),i=0..15);

[Maple Math]

The following Maple procedure takes as input a binary tree as produced by combstruct using the specification above and returns its internal path length:

> ipl:=proc(t)
if type(t,name) then 0 # the tree is of the form Z or Epsilon
else # the tree is of the form Prod(Z,t1,t2)
ipl(op(2,t))+ipl(op(3,t))+size(t)-1
fi
end:
size:=proc(t) eval(subs([Prod=`+`,Z=1,Epsilon=0],t)) end:

The internal path length is computed using a simple bijection: each of the nodes on a branch from the root to a leaf are counted once for each of the nodes below it. Here is an example on a tree of size 5 generated by combstruct[draw] from the specification above:

> T:=draw([bintree,bin,unlabelled],size=5); ipl(T);

[Maple Math]

[Maple Math]

We shall compute the average internal path length by first computing the total internal path length, i.e. the sum of the internal path lengths of all the binary trees of size [Maple Math] and then dividing this number by the [Maple Math] th Catalan number. We first start by computing experimentally the first values of these numbers, generating all the trees of fixed sizes for small sizes with combstruct[allstructs] , and computing the sum of the internal path lengths of all these trees with the procedure ipl above.

> for i to 5 do sum_ipl[i]:=`+`(op(map(ipl,allstructs([bintree,bin,unlabelled],size=i)))) od;

[Maple Math]

[Maple Math]

[Maple Math]

[Maple Math]

[Maple Math]

Combinatorial model for path lengths

The numbers computed above can actually be derived more efficiently, together with many other results related to path lengths using combstruct's ability to deal with marks (atoms of size 0). The idea consists in writing a combinatorial specification such that the number of objects of size [Maple Math] is exactly the sum of the internal path lengths of all the binary trees of size [Maple Math] . The combinatorial technique used here extends to other combinatorial structures and leads to a combstruct-based analysis of Pollard's [Maple Math] -algorithm.
The grammar for binary trees is modified to take into account "decorated" binary trees. A binary tree is decorated by putting two marks (An for Ancestor and De for Descendant) on two nodes belonging to the same branch. The number of ways of decorating a binary tree is then exactly its internal path length. Couting the number of decorated trees therefore corresponds to summing the internal path lengths of all binary trees.

> bin2:={bintree2=Prod(Z,
Union(leftright2,Prod(An,leftright1))),
bintree1=Prod(Z,
Union(leftright1,Prod(De,bintree,bintree))),
leftright2=Union(Prod(bintree2,bintree),
Prod(bintree,bintree2)),
leftright1=Union(Prod(bintree1,bintree),
Prod(bintree,bintree1)),
An=Epsilon, De=Epsilon} union bin:

The sequence of cumulated internal path lengths is now derived in less than a second:

> seq(count([bintree2,bin2,unlabelled],size=i),i=1..10);

[Maple Math]

Here are the 14 decorated trees of size 3:

> allstructs([bintree2,bin2,unlabelled],size=3);

[Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math]

Generating functions

From the grammars describing binary trees and decorated binary trees, the average internal path length can be computed via generating functions. Using combstruct[gfsolve] , we first derive the generating functions for the Catalan numbers and for the cumulated path lengths:

> F:=subs(gfsolve(bin,unlabelled,z),bintree(z));

[Maple Math]

> S:=series(F,z,30);

[Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math]

> Fpl:=subs(gfsolve(bin2,unlabelled,z),bintree2(z));

[Maple Math]

> Spl:=series(Fpl,z,30);

[Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math]

The average path length is simply the ratio of these coefficients:

> seq(coeff(Spl,z,i)/coeff(S,z,i),i=0..28);

[Maple Math] [Maple Math] [Maple Math] [Maple Math]

> evalf(["]);

[Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math]

Empirically, these numbers grow slightly faster than linearly with the size of the tree. A closed-form for the average path length can be established rigorously using gfun . The generating functions being algebraic, they satisfy linear differential equations. These can be derived by gfun[holexprtodiffeq] . From these differential equations, a linear recurrence satisfied by the Taylor coefficients is obtained by gfun[diffeqtorec] . It turns out that these recurrences fall into a class for which Maple's rsolve can find a solution.
Here are the differential equations:

> deqF:=holexprtodiffeq(F,y(z));

[Maple Math]

> deqFpl:=holexprtodiffeq(Fpl,y(z));

[Maple Math] [Maple Math]

From there, the recurrences follow:

> recF:=diffeqtorec(deqF,y(z),u(n));

[Maple Math]

> recFpl:=diffeqtorec(deqFpl,y(z),u(n));

[Maple Math] [Maple Math] [Maple Math]

The recurrence satisfied by the Catalan numbers being linear of order 1, it is obvious that Maple will be able to solve it:

> Cat:=rsolve(recF,u(n));

[Maple Math]

It is however more surprising that rsolve can compute the solution of the 3rd order recurrence satisfied by the cumulated path lengths. This is due to the implementation of the recent algorithm by M. Petkovsek for hypergeometric solutions of linear recurrences.

> Pl:=rsolve(recFpl,u(n));

[Maple Math]

After some cleaning up, we have thus proved the following.

Proposition . The average internal path length in a random planar unlabelled binary tree of size [Maple Math] under the uniform model is
[Maple Math] .

> seq(4^i*(i+1)/binomial(2*i,i)-3*i-1,i=1..10);

[Maple Math]

From there, the asymptotic behaviour is well within the reach of Maple's asympt command:

> map(combine,asympt(Pl/Cat,n),exp);

[Maple Math]

Average distance from a point to a cycle

We now come back to the analysis of Pollard's [Maple Math] -algorithm. The analysis of the average distance from a node to its cycle proceeds exactly as in the case of binary trees by decorating the corresponding trees. The grammar is derived from the grammar G above:

> G;

[Maple Math] [Maple Math] [Maple Math]

The process of decoration consists in isolating the path between ancestor and descendant in the combinatorial structure. For instance, non-planar binary trees are described by

> npbin:={bintree=subs(G,bintree)};

[Maple Math]

Here, Set indicates that the respective position of the two subtrees does not count. In the labelled case, we can therefore decide that the decorated branch will always be the leftmost one (instead of considering all the paths leftright1 and leftright2 as in the case of planar binary trees). Thus the decorated grammar in this case is

> bin3:={bintree2=Prod(Z,Union(Prod(bintree2,bintree),
Prod(An,bintree1,bintree))),
bintree1=
Prod(Z,Union(Prod(bintree1,bintree),
Prod(De,Set(bintree,card=2)),De)),
An=Epsilon,De=Epsilon} union npbin:

Again, we can check that the first values coincide with the result produced by computing the internal path length on all the non-planar binary trees:

> ipl:=proc(t) if type(t,name) then 0 else size(t)-1+ipl(op([2,1],t))+ipl(op([2,2],t)) fi end:
size:=proc(t) local i;eval(subs([Prod=`+`,Set=`+`,seq(i=1,i=indets(t,name))],t)) end:

> for i to 6 do i,`+`(op(map(ipl,allstructs([bintree,npbin,labelled],size=i)))) od;

[Maple Math]

[Maple Math]

[Maple Math]

[Maple Math]

[Maple Math]

[Maple Math]

> seq(count([bintree2,bin3,labelled],size=i),i=1..11);

[Maple Math]

The same process readily extends to the functional graphs involved in Pollard's algorithm by a decomposition of sets and cycles which isolates the marked part.

> G1:={fungraph=Prod(connected_component1,
Set(connected_component)),
connected_component1=Prod(Z,
Union(Prod(An,bintree1),bintree2),
Sequence(Prod(Z,bintree))),
connected_component=Cycle(Prod(Z,bintree))}
union bin3:

We first write procedures to compute the distance from nodes to their cycles in the non-decorated graphs so that we can check on the first few values that our grammar for decorated graphs is consistent with the non-decorated one.

> iplfg:=proc(g) `+`(op(map(iplcc,g))) end:
iplcc:=proc(cc) `+`(op(map(iplbt,cc))) end:
iplbt:=proc(bt) if type(bt,name) then 0 else size(op(2,bt))+iplt(op(2,bt)) fi end:
iplt:=proc(t) if type(t,name) then 0 else size(t)-1+iplt(op([2,1],t))+iplt(op([2,2],t)) fi end:
size:=proc(t) local i; eval(subs([Set=`+`,Prod=`+`,seq(i=1,i=indets(t,name))],t)) end:

Here are a few tests:

> to 4 do t:=draw([fungraph,G,labelled],size=6); print(t,iplfg(t)) od:

[Maple Math]

[Maple Math]

[Maple Math]

[Maple Math]

The number of functional graphs of fixed size grows quite fast:

> seq(count([fungraph,G,labelled],size=i),i=1..10);

[Maple Math]

We can therefore only check the values for sizes 2 and 4 in a reasonable amount of time

> map(iplfg,allstructs([fungraph,G,labelled],size=2));

[Maple Math]

> sort(map(iplfg,allstructs([fungraph,G,labelled],size=4)));

[Maple Math]

> convert(",`+`);

[Maple Math]

On the other hand, here is the counting sequence for decorated graphs:

> seq(count([fungraph,G1,labelled],size=i),i=1..10);

[Maple Math]

It takes 11 minutes to check that 10440 is also the value we get by generating all the binary functional graphs of size 6.

We now compute the generating functions with combstruct[gfsolve] :

> F:=subs(gfsolve(G,labelled,z),fungraph(z));

[Maple Math]

> Fpl:=subs(gfsolve(G1,labelled,z),fungraph(z));

[Maple Math]

Again, we obtain closed-forms from these generating functions by applying gfun to find the differential equation they satisfy and from there the recurrence satisfied by their coefficients:

> deqF:=holexprtodiffeq(F,y(z));

[Maple Math]

> deqFpl:=holexprtodiffeq(Fpl,y(z));

[Maple Math] [Maple Math]

> recF:=diffeqtorec(deqF,y(z),u(n));

[Maple Math]

> recFpl:=diffeqtorec(deqFpl,y(z),u(n));

[Maple Math] [Maple Math]

Unfortunately, due to a weakness in Maple's current version of rsolve, the solutions to these recurrences are not found

> rsolve(recF,u(n)),rsolve(recFpl,u(n));

[Maple Math] [Maple Math] [Maple Math] [Maple Math]

However, Maple can find the solution if we help it by taking into account the parity of the generating functions:

> deqF2:=holexprtodiffeq(subs(z=z^(1/2),F),y(z)):

> deqFpl2:=holexprtodiffeq(subs(z=z^(1/2),normal(Fpl)),y(z)):

> recF2:=diffeqtorec(deqF2,y(z),u(n));

[Maple Math]

> recFpl2:=diffeqtorec(deqFpl2,y(z),u(n));

[Maple Math] [Maple Math] [Maple Math]

> nb_fg:=subs(n=n/2,rsolve(recF2,u(n)));

[Maple Math]

> tot_pl:=map(simplify,subs(n=n/2,rsolve(recFpl2,u(n))));

[Maple Math]

This is summarized by:

Proposition : The average distance from a point to a cycle in a random binary non-planar functional graph of size [Maple Math] is given by [Maple Math] .

From there a direct asymptotic expansion follows:

> map(simplify,asympt(subs(tot_pl/nb_fg/n),n));

[Maple Math] [Maple Math]

> as_cost_1:=op(1,"):

Thus under our probabilistic model, the first stage of Pollard's [Maple Math] -algorithm takes asymptotically [Maple Math] steps, where [Maple Math] is the smallest prime factor of the number to be factored and [Maple Math] .

Average length of the cycles

After both sequences ( [Maple Math] and [Maple Math] ) have reached the cycle, the number of steps before the end of Pollard's algorithm is bounded by the length of this cycle. Since this length might be correlated to the number of steps before, it is not a priori sufficient to compute the average length of the cycles in a random graph obeying our grammar. Instead, we modify the decorated graphs so that the Ancestor is now an element of the cycle. The number of decorated graphs of size [Maple Math] is thus the sum for all the nodes of the length of their cycle, summed over all graphs of size [Maple Math] . Here is the new grammar:

> G2:=remove(type,G1,identical(connected_component1)=anything)
union {connected_component1=
Prod(Z,
Union(bintree1,Prod(De,bintree)),
Sequence(Prod(Z,bintree)),
Prod(Z,An,bintree),
Sequence(Prod(Z,bintree)))}:

The corresponding generating function is again algebraic:

> Fplcycle:=factor(normal(subs(gfsolve(G2,labelled,z),fungraph(z))));

[Maple Math]

> series(Fplcycle,z,11);

[Maple Math]

Again, a closed form follows for the coefficients:

> holexprtodiffeq(subs(z=z^(1/2),Fplcycle)/z,y(z));

[Maple Math]

> diffeqtorec(",y(z),u(n));

[Maple Math] [Maple Math]

> rsolve(",u(n));

[Maple Math]

> T1:=map(simplify,subs(n=n/2-1,")+tot_pl);

[Maple Math] [Maple Math]

We thus get the following conclusion:

Theorem . The average number of steps of Pollard's [Maple Math] -algorithm under the probabilistic model is bounded between [Maple Math] and [Maple Math] ,
where [Maple Math] is the smallest prime factor of the integer whose factorization is sought.
Asymptotically, this number of steps therefore behaves like
[Maple Math] , with [Maple Math] in [Maple Math] .

Here is the verification of the second part:

> map(simplify,asympt(T1/nb_fg/n,n,3));

[Maple Math]

> as_cost_2:=op(1,"):

> evalf("");

[Maple Math]

Since the smallest prime factor of a number [Maple Math] is of order [Maple Math] , it follows that the (arithmetic) complexity of Pollard's algorithm under this model grows in [Maple Math] .

Extension: polynomials of degree [Maple Math]

When the factor [Maple Math] to be found is known a priori to satisfy [Maple Math] for some [Maple Math] known in advance, Pollard conjectured that the number of steps in his algorithm could be reduced by a factor [Maple Math] by using the polynomial [Maple Math] instead of [Maple Math] . Brent and Pollard used this idea to factor the 8th Fermat number, whose factors are known to be [Maple Math] . We know consider Pollard's under the probabilistic model for [Maple Math] .

Here is the grammar for functional graphs with [Maple Math] as a parameter:

> Gm:={fg=Set(cc),cc=Cycle(Prod(Z,Set(tree,card=m-1))),
tree=Union(Z,Prod(Z,Set(tree,card=m)))}:

Here is the corresponding decorated grammar for the distances to the cycles:

> Gm2:={fg1=Prod(cc1,Set(cc)),
cc1=Prod(Z,Union(Prod(An,tree1),tree2),
Set(tree,card=m-2),
Sequence(Prod(Z,Set(tree,card=m-1)))),
tree1=Prod(Z,Union(Prod(tree1,Set(tree,card=m-1)),
Prod(De,Set(tree,card=m)),De)),
tree2=Prod(Z,Union(tree2,Prod(An,tree1)),
Set(tree,card=m-1)),
An=Epsilon,De=Epsilon} union Gm:

And the grammar for the distances to the cycles plus the length of the cycle:

> Gm3:={fg2=Prod(cc2,Set(cc)),
cc2=Prod(Z,Union(Prod(tree1,Set(tree,card=m-2)),
Prod(De,Set(tree,card=m-1))),
Sequence(Prod(Z,Set(tree,card=m-1))),
Prod(Z,An,Set(tree,card=m-1)),
Sequence(Prod(Z,Set(tree,card=m-1))))}
union Gm2:

[Maple Math]

We first check on the case [Maple Math] that we recover the generating functions we had before.

> sol:=gfsolve(subs(m=2,Gm3),labelled,z);

[Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math]

> normal(subs(sol,fg(z))-F);

[Maple Math]

> normal(subs(sol,fg1(z))-Fpl);

[Maple Math]

> normal(subs(sol,fg2(z))-Fplcycle);

[Maple Math]

[Maple Math]

The grammar becomes

> G6:=subs(m=6,Gm3);

[Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math]

From there we compute the generating functions.

> sol:=gfsolve(G6,labelled,z):

Here is the generating function for the functional graphs:

> F:=subs(sol,fg(z));

[Maple Math]

Again, the coefficients admit a closed-form which can be obtained using gfun (it could also be derived by Lagrange's inversion formula). We rather compute directly the asymptotic expansion of the number of functional graphs of size [Maple Math] whose nodes have an in-degree of 0 or 6 by singularity analysis. Unfortunately, Maple's series command is not yet able to deal with algebraic functions like the RootOf above. We therefore start from the equation in the system which is at the origin of the singularity:

> T:=subs(sol,tree(z));

[Maple Math]

> tree(z)-subs(gfeqns(G6,labelled,z),tree(z));

[Maple Math]

> eq:=subs(tree(z)=y,Z(z)=z,");

[Maple Math]

> diff(eq,y);

[Maple Math]

> solve({"","},{y,z});

[Maple Math] [Maple Math]

It follows that the dominant singularity is at [Maple Math] , where [Maple Math] . The singular expansion of [Maple Math] can be computed at that point by power series reversion (we set [Maple Math] where [Maple Math] is the singularity, which helps series a little) :

> sing:=5/6*12^(1/3):

> series(subs(z=sing*(1-theta^2),eq),y=12^(1/3));

[Maple Math] [Maple Math] [Maple Math]

> {solve(",y)};

[Maple Math] [Maple Math] [Maple Math]

> sery:=op(select(proc(s,u) evalb(signum(coeff(s,u,1))=-1) end,",theta));

[Maple Math] [Maple Math]

> series(subs(T=sery,z=sing*(1-theta^2),F),theta);

[Maple Math]

Hence the number of functional graphs with in-degree in {0,6} is asymptotically:

> as_fg:=coeff(",theta,-1)/sqrt(Pi)*n^(-1/2)*sing^(-n);

[Maple Math]

We next consider the distance to the cycle

> Fpl:=subs(sol,fg1(z));

[Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math]

> series(subs(T=sery,z=sing*(1-theta^2),Fpl),theta);

[Maple Math]

And the length of the cycle:

> Fpl2:=subs(sol,fg2(z));

[Maple Math] [Maple Math] [Maple Math] [Maple Math] [Maple Math]

> series(subs(T=sery,z=sing*(1-theta^2),Fpl2),theta);

[Maple Math]

Thus the distance to the cycle and the length of this cycle are both asymptotic to:

> asympt(coeff(",theta,-4)*n*sing^(-n)/as_fg/n,n);

[Maple Math]

Thus the ratio between this variant of Pollard's algorithm and the original one is in both cases (number of steps to the cycle and length of the cycle):

> ratio:=simplify(map(asympt,[as_cost_1/",as_cost_2/"/2],n));

[Maple Math]

Conclusion

This worksheet shows that some algorithms whose complexity analysis does not reduce to counting the number of subcomponents in some kind of recursive combinatorial structure can sometimes be treated by combstruct using several marks. This is in particular true of many recursive searching algorithms whose complexity is related to the path length of an underlying structure.