Statistical Physics of the Random Graph Model

Rémi Monasson

Laboratoire de Physique Théorique, École Normale Supérieure

Algorithms Seminar

April 6, 1998

[summary by Philippe Flajolet]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).

Abstract
This talk addresses the random graph model originally introduced by Erdös et Rényi in 1959. This model gives rise to a large number of threshold phenomena that are evocative of phase transitions in statistical physics. The talk illustrates the way several results on random graphs can be reexamined in a new perspective provided by a simple model of statistical physics, the Potts model. The problem addressed is principally that of the size of the giant component for which quantitative estimates are derived.

More generally, the talk is motivated by a desire to understand what statistical physics models may bring to the realm of threshold problems, not only in random graphs but also in the satisfiability of random boolean formulæ.



1   The Random Graph Models

The most natural random graph models have been introduced by Erdös and Rényi in a series of eventually famous papers that starts with [5, 6]. They are denoted by Gn,p and G^n,e and are defined as follow: The first model is of the Bernoulli type (there are N trials, each with independent probability p of success), the second one is more ``combinatorial''. Given the fact that the Bernoulli distribution B(N,p) is narrowly centered around its mean Np, we expect the following fact.
The characteristics of Gn,p resemble those of G^n,e provided e» Np.
We refer globally to Bollobás's book [4] for a discussion of these rich models and for precise conditions that make the assertion above into a valid mathematical statement. (The transfer from G^n,e to Gn,p is an Abelian one, whereas the converse transfer has a Tauberian flavour.)

Imagine a graph as evolving in time from totally disconnected to complete, through successive additions of edges that are reflected by increasing values of p from 0 to 1. What is characteristic of Gn,p (and thus, of the companion G^n,e model) is the presence of sharp thresholds. A threshold phenomenon for a property P means that there is a function p0(n) such that, with (very) high probability (as n®¥), P does not hold when p« p0(n) while for p» p0(n), P holds. (Of course, one may look for all sorts of detailed informations near the threshold p0(n).)

Here is a simplified picture of what goes on in Gn,p, expressed in terms of the mean number of edges, m=Np. Only isolated vertices and edges will be present when m« n1/2; but trees of size 3 will start appearing at m» n1/2, trees of size 4 at m» n2/3, etc. There is (almost surely) no cycle when m« n. Later when m=l n/2 and l <1 there is at most one cycle in each component and the largest component almost surely has size Q(log n). A dramatic phase transition occurs near m=n/2 when one or several large components of size n2/3 appear. Still later, when m=l n/2 and l>1, we find a single ``giant'' component of size Q(n). However, we'll have to wait a little longer, namely till m» 1/2 nlog n, to attain full connectedness, at which point the graph ceases to be interesting for the problems under discussion here.

There are various approaches to these problems. Most of them, following Erdös and Rényi's original papers [5, 6], are probabilistic and well explained in [4]. Roughly, one has to cope in this framework with random variables satisfying intricate dependencies; moment methods, tail inequalities, or probabilistic inequalities are then essential. The literature in this direction is immense and Bollobás's book already includes more than 700 references. The best results relative to connectedness that are available at this time (formulated in terms of G^n,e) are probably those of Bender, Canfield, and McKay; see [1, 2, 3]. In contrast, only a handful of papers starting with Knuth, Pittel, and collaborators resort to analytic methods1; see [7, 9]. Even fewer papers rely on methods from statistical physics. The work under discussion here is a pioneering attack on this range of hard problems; see [10, 11] for applications of related ideas to the random k-satisfiability problem.

2   The Potts Model

The Potts model of statistical physics considers particles or sites whose states (sometimes referred to as ``colours'') may assume any of q values. In the particular case of random graphs, it instantiates as follows. Consider n sites that we may imagine as regularly spaced on a circle. Each site may be in a certain state that is an integer of {0,1,...,q-1}. The integer q is a parameter of the model and when q=2, one can think of the states as ``spins'' representing the orientation of some vector, e.g., a magnetic moment. A configuration S=(s1,...,sn) is an assignment of states to each site, so that there are qn possible configurations. The energy of a configuration is defined as
E(S)=-
g
n
 
å
i<j
1
 
si,sj
,
where 1x,y is the indicator of x=y that has value 1 if x=y holds and 0 otherwise. There g is a parameter; the fact that one takes all the N=n(n-1)/2 combinations i<j corresponds to a model with complete interactions, that is, the underlying graph is the complete graph. (Models of statistical physics often consider instead an underlying graph constrained to be a regular lattice in dimension 1, 2, or 3.)

An essential object of statistical physics is the partition function defined here as
Z(g,n)=
 
å
S
e
-E(S)
 
,
which is thus a sum of qn terms. There are two main points in the talk: (i) the function Z provides information on the random graph model; (ii) it is possible to estimate analytically various characteristics of Z.

3   The Partition Function and Random Graphs

First of all, the partition function is (almost) a counting generating function in disguise. One has
Z(g,n) =
 
å
S
 
Õ
i<j
exp æ
ç
ç
è
g
n
1
 
si,sj
ö
÷
÷
ø
=
 
å
S
 
Õ
i<j
æ
è
1+1
 
si,sj
(e
g/n
 
-1) ö
ø
,
as results from the identity 12=1. Next,
Z(g,n)
»
 
å
S
 
Õ
i<j
æ
ç
ç
è
1+
g
n
1
 
si,sj
) ö
÷
÷
ø
    (1)
 
=
 
å
G
æ
ç
ç
è
g
n
ö
÷
÷
ø
e(G)



 
qc(G).
    (2)
The first line (1) is a natural approximation for n large. In the second line (2), e(G) is the number of edges of the graph G, c(G) the number of its connected components, and the sum is over all graphs G with n vertices. The reason why (2) is true is that the general term of the sum involves a product over all possible edges, and a product like us1,s2us2,s3 has value 1 only if s1,s2,s3 are of the same colour, in which case there are altogether q degrees of freedom.

Now, a graph G with n vertices and e edges has probability
P
 
g
(G)= æ
ç
ç
è
g
n
ö
÷
÷
ø
e



 
æ
ç
ç
è
1-
g
n
ö
÷
÷
ø
N-e



 
,
in the model Gn,p, where p=g/n and, like before, N=n(n-1)/2. Then, equation (2) yields the approximate formula,
Z(g,n)» e
g n/2
 
 
å
G
P
 
g
(G) qc(G).     (3)
(These approximations are stated here without error terms but it is not hard to assign them sufficient validity conditions.)

4   The Potts Model and the Process of ``Analytic Continuation''

In order to approach the number of connected components, we return to the definition of the partition function and aim at transforming its expression. The energy, E(S) depends on n variables, but only through their q possible values, with q the parameter of the Potts model. Indeed, for sÎ{0,...,q-1}, define the occupancy variables,
X
 
s
(S) =
1
n
n
å
i=1
1
 
si,s
,
that describe how many times each value of {0,...,q-1} is used by a configuration. One finds easily that
E(S)=
g
2
-
g n
2
q-1
å
s=0
X
2
 
s
.
(The term g/2 is ignored in subsequent computations.) Now, grouping states according to the values of their occupancy vectors {Xs} yields
Z(g,n) »
 
å
{X
 
s
}
æ
è
n
nX0,...,nXq-1
ö
ø
exp æ
ç
ç
è
-
g n
2
q-1
å
s=0
X
2
 
s
ö
÷
÷
ø
,
where the Xs are such that åsXs=1 and the Xs go by steps of 1/n. (The original derivation of Monasson makes use of manipulations with indicator variables that are related to the theory of ``replicas''.) Then, Stirling's formula employed to approximate the factorials present in the multinomial coefficient produces
Z(g,n)»
 
å
{X
 
s
}
exp æ
ç
ç
è
-n æ
ç
ç
è
 
å
s
X
 
s
log X
 
s
+
g
2
 
å
s
X
2
 
s
ö
÷
÷
ø
ö
÷
÷
ø
.     (4)
The form (4) is indeed a q-fold sum and the original n-fold summations and products have been eliminated.

The next step consists in evaluating what happens with the approximation (4) taken as
Z(g,n)»
 
å
{X
 
s
}
exp æ
è
-n G({X
 
s
}) ö
ø
.     (5)
The idea is to estimate the sum by means of the q-dimensional Laplace method, which requires locating the global extrema of the exponential. It is observed that local extrema at least are obtained by trying
X0=
1
q
(1+(q-1)s),      X1=...= Xq-1=
1
q
(1-s).     (6)
Then, the argument of the exponential in (4) is locally maximized if one fixes s as a root of
log æ
ç
ç
è
1+(q-1)s
1-s
ö
÷
÷
ø
=g s.     (7)
It is believed that all global extrema are obtained in this way, up to permutation of indices. Under this assumption, the Laplace method can then be applied to approximate Z(g,n) as
Z(g,n)» e
-n F(q,g)
 
,     (8)
where F is essentially G({Xs}) of (5) evaluated at the Xs given by (6) in terms of q,s, where s=s(g) satisfies the condition (7).

We now dive into a more conjectural world based on a special kind of formal reasoning The principle of the heuristic analysis consists in extrapolating the asymptotic approximation of the partition function that is defined a priori for integer values of q only and make it an analytic function of q. Then, let q tend to 1 and hope for consistency. Once this is done, various results that can be checked successfully against known ones (via analytic or probabilistic methods) are obtained.

Let us postulate the validity of (8) for all real q, and in particular for q near 1.Within this framework, the mean number of connected components of a random graph of Gn,g/n is for instance accessible as (á Xñ denotes expectation):
á c(G) ñ =
d
dq
log Z(g,n) ½
½
½
½
 



q=1
,     (9)
where real values q®1 are used. For the region of interest which is thus q near 1, equation (7) becomes 1-s=eg s. This equation defines s as a function of g, s=s(g). The parameter s(g) is in fact an indicator of the fraction of sites in the largest connected component of the random graph. There is a bifurcation at s=1. The function s(g) is identically 0 when g<1, a fact consistent with known properties of the random graph before the emergence of the giant component. At s=1+, the function s(g) has a square-root singularity, while it becomes analytic for s>1. Thus informations about the giant component and its ``phase transition'' become amenable to this approach (details omitted in this abstract).

5   Discussion

A summary of the methodology is as follows. For a given problem, there are a priori two ``partition functions'',
Z
 
comb
:=
 
å
G
P(G) qc(G),      Z
 
phys
=
 
å
S
e
-E(S)
 
.
The process is then as follows.
  1. Choose the configuration space and energy function so that Zcomb» Zphys.
  2. Evaluate Zphys by: (i) identifying the order parameters (the Xs); (ii) determining asymptotic approximations (here by the Laplace method); (iii) performing an analytic ``continuation'' according to the chain ``(q integer) |® (q real) |® (q®1)''.
The major question to ask is why and to what extent does this approach provide useful quantitative result. Certainly, the approximations for fixed integer q can be justified. Also there is a possibility of matching the analysis near q=1 against what we know from analytic approaches. (For instance, it is known that the emergence of the giant component is related in an essential way to the occurrence of two coalescing saddle points.) So, in a way, the most surprising fact to be explained is that estimates initially conducted for integer q only (we used a q-dimensional Laplace method!) can be ``analytically continued'' to the region of q near 1.

We observe that complex analysis does sometimes provide a framework for such analytic continuation. For instance, a theorem of Carlson asserts that when a function f(s) is analytic (holomorphic) in a right-hand half-plane and is of moderate growth, f(s)=O(e(p-e)|s|), then: f(s) vanishes identically if and only if it vanishes at the nonnegative integers. Therefore, an identity A(s)=B(s) can be inferred just from its specialization at the integers2 provided it is known a priori that A,B don't grow to fast. For instance, it suffices to establish
sin2
p s
4
+cos2
p s
4
=1,
for s=0,1,..., in order to be sure that it holds for all complex s. Observations of this kind however fall short of providing a basis for the analytic continuation process employed here, given the intricate nature of the approximations involved.

References

[1]
Bender (Edward A.), Canfield (E. Rodney), and McKay (Brendan D.). -- The asymptotic number of labeled graphs with given number of vertices and edges. Random Structures & Algorithms, vol. 1, n°2, 1990, pp. 127--169.

[2]
Bender (Edward A.), Canfield (E. Rodney), and McKay (Brendan D.). -- Asymptotic properties of labeled connected graphs. Random Structures & Algorithms, vol. 3, n°2, 1992, pp. 183--202.

[3]
Bender (Edward A.), Canfield (E. Rodney), and McKay (Brendan D.). -- The asymptotic number of labeled graphs with n vertices, q edges, and no isolated vertices. Journal of Combinatorial Theory. Series A, vol. 80, n°1, 1997, pp. 124--150.

[4]
Bollobás (Béla). -- Random Graphs. -- Academic Press, 1985.

[5]
Erdös (P.) and Rényi (A.). -- On random graphs. I. Publ. Math. Debrecen, vol. 6, 1959, pp. 290--297.

[6]
Erdös (P.) and Rényi (A.). -- On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutató Int. Közl., vol. 5, 1960, pp. 17--61.

[7]
Flajolet (P.), Knuth (D. E.), and Pittel (B.). -- The first cycles in an evolving graph. Discrete Mathematics, vol. 75, 1989, pp. 167--215.

[8]
Hardy (G. H.). -- Ramanujan: Twelve Lectures on Subjects Suggested by his Life and Work. -- Chelsea Publishing Company, New-York, 1978, third edition. Reprinted and Corrected from the First Edition, Cambridge, 1940.

[9]
Janson (Svante), Knuth (Donald E.), Luczak (Tomasz), and Pittel (Boris). -- The birth of the giant component. Random Structures & Algorithms, vol. 4, n°3, 1993, pp. 233--358.

[10]
Monasson (Rémi) and Zecchina (Riccardo). -- Entropy of the K-satisfiability problem. Physical Review Letters, vol. 76, n°21, 1996, pp. 3881--3885.

[11]
Monasson (Rémi) and Zecchina (Riccardo). -- Statistical mechanics of the random K-satisfiability model. Physical Review E. Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics. Third Series, vol. 56, n°2, 1997, pp. 1357--1370.

1
As said by Frieze in his discussion of the paper by Janson, Knuth, Luczak, and Pittel [9] in the Mathematical Reviews [MR94h:05070]: ``The paper [9] and its predecessor [7] mark the entry of generating functions into the general theory of random graphs in a significant way. Previously, their use had mainly been restricted to the study of random trees and mappings. However, at the early stages of the evolution of a random graph we find that it is usually not too far from being a forest, and this allows generating functions an entry.''
2
It is well worth pointing at Hardy's discussion in [8, Ch. 11] of ``Ramanujan's heuristic'' on which Ramanujan based so much of his definite integral evaluations.

This document was translated from LATEX by HEVEA.