Research News, September 1998


Previous News Next News

Return to Research News Index


These pages are edited by Philippe.Flajolet@inria.fr.

The Princeton Meeting on Average-Case Analysis of Algorithms (Part II)

I continue here a discussion of various talks presented at the Princeton meeting. Here, we'll examine caching. This is a fairly classical topic in the context of disk accesses that's also fairly hot nowadays as processors have built-in caches, as do browsers or web servers, and many other pieces of software.

Caching

The Merriam-Webster dictionary defines a cache, a word that comes from French, as a hiding place, and better for us as ``a computer memory with very short access time used for storage of frequently used instructions or data''. Even if you've never thought of caches, you're in fact using one (almost!) every day: think of your desktop to which you have immediate access, as opposed to your file cabinets that store huge amounts of data, but with slow access.

Initially, caches were meant to accelerate access to data from disks or slow memory. Thus, they have been investigated early by specialists of performance evaluation. In the usual framework, you have a slow device of a large size m that you may refer to quite often. The idea is to maintain a fast but much smaller (and costlier) device of some size $c\ll m$in order to store the data that you anticipate will be most useful. The potential usefulness of a cache is based the fact that probabilities of accesses are quite uneven. For instance, you're much more likely to access data on Analysis of Algorithms than on antidisestablishmentarianism , where the scores of references under Altavista are, as of today at least, of 11,212 to 443 (!). What also matters is the expected locality of references, something that's a bit harder to capture by mathematical modelling. (Has anybody even tried?) Another issue is the cache management policy. The most common choice is LRU, which stands for Least Recently Used, meaning that the c items that were most recently accessed are in the cache. Other strategies are based on counters or on FIFO (first-come-first-serve) --you kill the oldest guys even if they're still useful.

Fig 1. The cache of the Alpha Chip by Digital.

On the analytic side, early studies have concentrated on the Independence Reference Model, where a long sequence is built of individual requests that are independent in the usual probabilistic sense. Thus, the parameters of the model are just a finite probability distribution $P=\{p_j\}_{j=1}^m$,and c, the cache size. At the IFIP Congress in 1971, W. C. King proposed expressions for the probability D of ``page faults'' (the probability of a non-hit of the cache) of various policies under this model. The typical formula (for LRU),

\begin{displaymath}
D=1-\sum_{i=1}^mp_i^2 \sum_{q=0}^{c-1}
(1-)^{c-1-q}
{m-q-2\choose m-k-1} \sum _{\vert J\vert=q, i\in J}
\frac{1}{1-P_J},\end{displaymath}

with $P_J=\sum_{j\in J} p_j$, illustrates some of the hardships. The inner sum ranges over all the subsets of $\{1,\ldots,m\}$,with cardinality at most c (!). Accordingly, we have here a huge Markov chain with ${m\choose c}$ states. As the formula indicates, the corresponding analysis is essentially based on summing over all possible cache configurations. Hence, the relevant probabilities are computable, but that's in exponential time.

When probabilities are equal, it is a good combinatorial exercise for students to analyse what goes on. An essential rôle is then played by Stirling subset numbers (a.k.a numbers of the second kind). defined by the fact that $\left\{ {m \atop k} \right\}$is the number of ways of partitioning $\{1,\ldots,m\}$ into k nonempty subsets. Generating functions involve powers of exponentials since

\begin{displaymath}
\sum_{n} \left\{ {m \atop k} \right\} \frac{z^n}{n!}
=\frac{1}{k!} \left(e^z-1\right)^k.\end{displaymath}

Knuth develops some of these aspects in a paper, ``An Analysis of Optimal Caching'', Journal Of Algorithms 6 (1985), 181-199.

Such product forms can be extended to nonuniform distributions pi. This is done in a paper with Gardy and Thimonier that is based on a description of cache fault events by shuffles of regular languages. Integral representations for the long-run cache fault probability D result, for instance,

\begin{displaymath}
D=1-
\frac{1}{2\pi}\int_0^{2\pi} \int_0^\infty
\Phi(t,e^{i\t...
 ...-e^{-ci\theta}}{1-e^{-i\theta}}
\,d\theta\,dt
\qquad\qquad (1),\end{displaymath}

where

\begin{displaymath}
\Phi(t,u)=\prod_{i=1}^m (1+u(e^{p_it}-1)),
\qquad
\Psi(t,u)=\sum_{i=1}^m p_i^2 /(1+u(e^{p_it}-1))
.\end{displaymath}

One may recognize in passing a form that generates all the subsets of a set,

\begin{displaymath}
\prod_{i=1}^m
(1+ux_i).\end{displaymath}

However, the numerical aspects of computing with such huge formulæ are somewhat unclear (and still untouched, as far as I know).

A formula like (1) is also reminiscent of Poisson laws, which suggests alternative probabilistic approaches. This thread has ben pursued by Dobrow and Fill. See, for instance, ``The move-to-front rule for self-organizing lists with Markov dependent requests'' available under Bob Dobrow's publications.

The talk of Predrag Jelenkovic at Princeton revisits these questions from an asymptotic perspective. Given that the request probabilities have some soft tail, how does the cache fault probability decrease under larger cache sizes? Both LRU caching and the closely related Move-To-Front rule of self-organizing search are treated. What is needed is an estimate of tail probabilities. First, product formulæ of a shape similar to (1) give the Laplace transform of the distribution of search costs (MTF) and page faults. The explicit character of the integral lends itself to asymptotic analysis. The case of a generalized Zipf law, with $p_k= c_\alpha k^{-\alpha}$,with $\alpha\gt 1$ is solved. See Jelenkovic's paper for a nice discussion of the literature and for the technical developments. For instance, the probability distribution of a reference R (i.e., $\Pr\{R=j\}=p_j$ in earlier notations) and of the search cost C for MTF satisfy the elegant relation

\begin{displaymath}
\lim_{n\to\infty} \frac{\Pr\{C\gt n\}}{\Pr\{R\gt n\}}
= \lef...
 ...ht) \left[\Gamma\left(1-\frac{1}{\alpha}
\right)\right]^\alpha.\end{displaymath}

Fig. 2. Predictions versus observed performance in the Baskett-Rafii model

Theory and practice.

A word now on caching models. Naturally, practitioners have criticized the Independent Reference Model since it is not designed to capture any locality of references. However, back in 1976, Forest Baskett and Abbas Rafii demonstrated that actual references with locality and dependent page access probabilities qj can often be modelled in a satisfactory fashion by the independent reference model but with different probabilities $q_j^\star$being used. This is good news for theoreticians, as their studies can regain some credit from the practical standpoint. Year ago, I had difficulties locating this 1976 report from Stanford. Times change, and the Stanford Library has done an excellent job of putting its old tech reports as gifs on the Web For instance, The A0 inversion model of program paging behavior is now accessible on-line. Many thanks to them for this service to the community.

Another interesting aspect is treated by Knuth in the paper referred to above. What happens when a cache is maintained ``clairvoyantly'', i.e., with perfect knowledge of the future? In that case, a rule stated by Belady at IBM in 1966 states that the optimal choice at each stage consists in throwing away from the cache the element whose reference is farthest in the future. Under the uniform distribution, Knuth proves that this rule avoids faults with probability about $\sqrt{c/m}$,which is surprisingly effective. I don't think anybody has revisited Belady's rule in the case of nonuniform reference probabilities. This is likely to be a good problem to work on.

I remember, once at a lecture, somebody being criticized sharply for the totally unrealistic assumption of knowledge of the future. However, there are many contexts where this is precisely achievable: a program that produces postscript with fonts descriptions knows the size of a laser printer's memory (the ``cache'') and is free to do some look-ahead to decide when to include a font description; a compiler certainly knows the size of its target processor's cache beforehand and it can similarly do arbitrary look-aheads in order to decide what it should do.

Fig. 3. The modified probabilities that emulate actual page references

Adaptive Search

Caching and adaptive list searching are intimately related, as the latter can be viewed as the special case where c=m, that is, `cache size = memory size'. In that situation, the cache becomes a list data structure the aim is to at optimize search times. The optimum arrangement of a sequential list is by order of decreasing probabilities, if probabilities are known in advance and requests are independent. (We usually give this as an exercise to our undergrads, don't we?).

In general, access probabilities are unknown a priori. Move-To-Front is then a way to approximate the optimum by moving an element to the front as soon as it is accessed. Another approach consists in keeping reference counts and keeping the elements in decreasing order of their counters. Sure enough, by some law of large numbers, the counters will eventually stabilize to give the right ordering, thereby ensuring optimality. How far should one go? That is the question.

Haddas Shachnai has a page full of nice preprints that deal with various aspects of resource allocation in distributed environments. A paper of Haddas Shachnai and Micha Hofri entitled ``The List Update Problem: Improved Bounds for the Counter Scheme'' deals specifically with this problem. There, it is shown for example that the Counter Scheme is within $1+\epsilon$ of the optimum after time about $n\log n/\epsilon$.The estimate holds for arbitrary probabilities. Amongst other things on Haddas's site, you may find a related discussion of tail probability bounds for sums of independent random variables, where names like Chernoff, Hoeffding, and Bennett are familiar.

To be continued!