New and Old Problems in Pattern Matching

Let $H$ and $T$ be sequences generated over a finite alphabet. We call $H$ the pattern and $T$ the text. The basic problems we shall deal with are: (i) how many times H occurs in $T$; (ii) how long one has to wait until H occurs in $T$. We consider the pattern matching problem in various settings: (1) In the string matching problem the pattern $H$ is considered to be a string, while in the subsequence matching problem the pattern $H$ is a substring; (2) The pattern $H$ can occur exactly or approximately; (3) In most cases we assume that the text is generated by a random source and the pattern is given. But in the repetitive pattern matching problem the pattern is part of the text; (4) Finally, one may assume that the text $T$ is generated among all possible strings. In the restricted pattern matching problem the text satisfies some additional constraints (e.g., in a $(d,k)$ sequence there is no runs of 0's shorter than $d$ and longer than $k$). In this talk we review analyses of the above-mentioned pattern matching problems. We show how to compute moments, limiting distributions and large deviations for the number of pattern occurrences and the waiting time (i.e., the first occurrence of the pattern). Throughout we use combinatorial (e.g., symbolic computation) and analytic methods (e.g., generating function, singularity analysis) of analysis of algorithms.

Virginie Collette Last modified: Tue Jun 19 11:22:38 CEST 2001