A properly typeset version of this document is available in postscript and in pdf.
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
The satisfaction of constrained formulae is a key issue in complexity
theory. Many computational problems are shown to be NP-complete
through a polynomial mapping onto the K-Satisfiability (SAT)
problem. Recently, there has been much interest in a random version of
the K-SAT problem defined as follows. Consider N Boolean variables
xi, i=1,...,N. Call clause C the logical OR of K randomly
chosen variables, each of them being negated or left unchanged with
equal probabilities. Then repeat this process by drawing
independently M random clauses Cl, l =1 ,... ,M. The
logical AND of all clauses, F, is said to be satisfiable if
there exists a logical assignment to the x's evaluating F to
true, unsatisfiable otherwise.
Numerical experiments have concentrated upon the study of the
probability PN (a , K) that a given F including
M=a N clauses be satisfiable. For large sizes of N, there appears a
remarkable behaviour: P seems to reach unity for a < a
c (K) and vanishes for a > a c (K) [6]. Such
an abrupt threshold behaviour, separating a SAT phase from an UNSAT
one, has indeed been rigourously confirmed for 2-SAT, which is in P,
with a c (2) =1 [2, 5]. For larger K ³ 3, K-SAT is
in NP and much less is known. The existence of a sharp transition has
not been proven yet but precise estimates of the thresholds have been
found: a c (3) ~ 4.25. Moreover, some lower and upper
bounds to a c (3) (if it exists), a l.b.=3.003 and
a u.b.=4.64 respectively have been established
[4, 3].
The classical approaches to study the SAT phenomenon threshold are both
combinatorial and probabilistic. A statistical physics approach was used in
[8, 9]. Such an approach allows properties to be
predicted. It has
been applied already to random graphs and it has led to
large deviation results for the threshold
phenomenon of random graphs in addition to previously known
results. This approach
seems therefore to be
powerful. However
it proves much harder to apply to the SAT threshold
phenomenon. It yields in particular a surprising
change concerning the proportion of variables fixed in the neighbourhood of the
threshold between 2-SAT and 3-SAT. This could partly account for the complexity
gap between these two problems. In order to apply the statistical physics
approach, the following
steps were carried out.
First, the energy function corresponding
to the K-SAT problem is identified. The logical values of the x's can be
represented by N binary variables Si's, called spins, through the
one-to-one mapping Si=-1 (respectively +1) if xi is false
(resp. true). We then encode the random clauses into a M× N
matrix Cli in the following way: Cli=-1
(respectively +1) if the clause Cl includes xi
(resp. xi), Cli=0 otherwise. Consider now the
cost-function E[C , S] defined as the number of
clauses that are not satisfied by the logical assignment corresponding
to configuration S. The minimum E[C] of E[C ,
S], that is, the lowest number of violated clauses that can be
achieved by the best possible logical assignment [8, 9], is a
random variable which becomes totally concentrated around its mean
value « E[C] » in the large size limit [1]. The
latter is accessible through the knowledge of the averaged logarithm
of the generating function
Z[C
]=
å
S
exp
(
- E[C , S] / T
)
(1)
since
« E[C ] » = - T « log Z[C ] » + O(T2)
when the auxiliary parameter T is eventually sent to zero. Being the minimal
number of violated clauses, « E[C ] » equals zero in the SAT
region and is positive in the UNSAT phase, allowing the location of
a c(K).
The calculation of the average value of the logarithm of Z
in (1) is an awkward one. To circumvent this difficulty, we
compute the nth moment of Z for integer-valued n and perform
an analytical continuation to real n in order to exploit the identity
« Z[C ] n » = 1 + n « log Z[C ] » +
O(n2).
The nth moment of Z is obtained by replicating n
times the sum over the spin configuration S and averaging over
the clause distribution [8]
« Z[C
] n » =
å
S1 , S2 , ... ,
Sn
« exp
æ ç ç è
-
n
å
a=1
E[C ,
Sa] / T
ö ÷ ÷ ø
».
(2)
It is crucial to notice that the averaged term in (2)
depends on the n× N spin replicas only through the 2n occupation
fractions c(s) labelled by the vectors s
with n binary components; c(s) equals the number (divided
by N) of labels i such that Sia=s a, " a=1,...,n. Taking into account the combinatorial entropy of the labels
i at fixed occupation fractions,
« Z[C ] n » ~ exp
(NFmax)
where Fmax is the maximum over all possible xs
of the functional [8]
F[ { c } ] = -
å
s
c(s) log c(s) + a log
é ê ê ë
å
s1 , ... ,
sK
c(s1) ···
c(sK)exp
æ ç ç è
-
1
T
n
å
a=1
K
å
l =1
d [s
a
l
+ 1 ]
ö ÷ ÷ ø
ù ú ú û
.
(3)
The optimisation conditions over F[ { c} ] provide 2n coupled
equations for the cs. Notice that F is a symmetric functional,
that is, invariant under any permutation of the replicas a. A maximum
may thus be sought in the so-called replica symmetric (RS) subspace of
dimension n+1 where c(s) is left unchanged under the
action of the symmetric group. Within the RS subspace, the occupation
fractions may be conveniently expressed as the moments of a
probability distribution P(m) over the range -1£ m£ 1
[8]. Once the number of replicas n is sent to zero, we obtain a
self-consistent functional equation for the order parameter
P(m) that can be solved numerically.
What is the meaning of the distribution P(m)? Consider a formula
F and all the spin configurations Sj, j=1,..., N attaining the minimum E[C ] of the cost-function
E[C,S]. Define then the average Boolean magnetisations
of the spins
mi =
1
N
N
å
j=1
Sij,
(4)
over the set of optimal configurations. Call H(C,m) the
histogram of the mis and H(m) the average of H(C,m) over
the choices of the formulae F. H(m) is a probability
distribution over the interval -1£ m£ 1 giving information about
the resulting constraints on the variables induced by the clauses. It
has been shown that, if the RS solution is the global maximum of
(3) (and not only a local one), H(m) equals the above
mentioned P(m) in the limit of large sizes N® ¥. Therefore,
the order parameter arising in the replica calculation reflects the
``microscopic'' structure of the solutions of the K-SAT problem.
Of particular interest are the fully fixed variables, that is the
xi's such that mi = ± 1. In the following, the fraction of fully
constrained variables will be denoted by g (a , K).
Clearly, g (a , K) vanishes in the SAT region otherwise
the addition of a new clause to F would lead to a
contradiction with a finite probability. Two kinds
of scenarii arise when entering the UNSAT phase. For 2-SAT, g
(a , 2) smoothly increases above the threshold
ac(2)=1. For 3-SAT (and more generally K³ 3), g
(a , 3) exhibits a discontinuous jump to a finite value g
c~ 0.9 slightly above the threshold. While ac(2)=1
is correctly found, the RS prediction for ac(3)=4.6
exceeds the experimental estimates by 10%. Work is currently under
progress to refine the above calculation and enlarge the subspace
where the global maximum is sought in.
Qualitatively speaking, however, we expect the main
conclusion of this work to be correct: the SAT/UNSAT transition is
accompanied by a smooth (respectively abrupt) change in the structure
of the solutions of the 2-SAT (resp. 3-SAT) problem. Furthermore, we
conjecture that this discrepancy is responsible for the difference of
typical complexities of both models recently observed in numerical
studies [10]. The typical solving time of search algorithms
displays an easy-hard-easy pattern as a function of a with a peak
of complexity close to the threshold. The peak time seems to scale
polynomially with N for the 2-SAT problem and exponentially with N
in the 3-SAT case. From an intuitive point of view, the search for
solutions ought to be more time-consuming in the presence of a finite
fraction of fixed variables since the exact determination of
the latter necessarily requires an exhaustive enumeration of the
variables. To test this conjecture, a mixed
2+p-model has been introduced; it includes a fraction p
(resp. 1-p) of clauses of
length two (resp. three) and thus interpolates between the 2-SAT
(p=0) and 3-SAT (p=1) problems. The RS theory predicts that
the SAT/UNSAT transition becomes abrupt when p > p0 =0.41. Precise
numerical simulations support the conjecture that the
polynomial/exponential crossover occurs at the same critical p0. An
additional argument in favour of this conclusion is provided by the
analysis of the finite-size effects on PN (a , K) and the
emergence of some universality for p<p0. A detailed account of
these findings may be found in [7].
Broder (Andrei Z.), Frieze (Alan M.), and Upfal (Eli). --
On the satisfiability and maximum satisfiability of random
3-CNF formulas. In Proceedings of the Fourth Annual ACM-SIAM
Symposium on Discrete Algorithms (Austin, TX, 1993). pp. 322--330. --
New York, 1993.
Chvátal (V.) and Reed (B.). --
Miks gets some (the odds are on his side). In Proceedings of the
33rd IEEE Symposium on Foundations of Computer Science, pp. 620--627. --
1992.
Dubois (O.) and Boufkhad (Y.). --
A general upper bound for the satisfiability threshold of random
r-SAT formulae. Journal of Algorithms, vol. 24,
1997, pp. 395--420.
Frieze (Alan) and Suen (Stephen). --
Analysis of two simple heuristics on a random instance of
k-SAT. Journal of Algorithms, vol. 20, n°2,
1996, pp. 312--355.
Goerdt (Andreas). --
A threshold for unsatisfiability. Journal of Computer and System
Sciences, vol. 53, n°3, 1996, pp. 469--486. --
1994 ACM Symposium on Parallel Algorithms and Architectures (Cape
May, NJ, 1994).
Mitchell (D.), Selman (B.), and Levesque (H. J.). --
Hard and easy distributions of SAT problems. In Proceedings of the American Association for Artificial Intelligence AAAI-92,
pp. 459--465. --
1992.
Monasson (R.), Zecchina (R.), Kirkpatrick (S.), Selman (B.), and Troyansky
(L.). --
Typical-case complexity results from a new type of phase
transition. --
In preparation.
Monasson (Rémi) and Zecchina (Riccardo). --
Statistical mechanics of the random K-satisfiability model. Physical Review E. Statistical Physics, Plasmas, Fluids, and Related
Interdisciplinary Topics. Third Series, vol. 56, n°2,
1997, pp. 1357--1370.
Selman (Bart) and Kirkpatrick (Scott). --
Critical behavior in the computational cost of satisfiability
testing. Artificial Intelligence, vol. 81, n°1-2,
1996, pp. 273--295. --
Frontiers in problem solving: phase transitions and complexity.