Mathieu Raffinot, Génopole, Université d'Évry

Approximate Matching of Secondary Structures

Several methods have been developed for identifying more or less complex RNA structures in a genome. Whatever the method is, it is always based on the search of conserved primary and secondary structures. While various efficient methods have been developed for searching motifs of the primary structure, usually represented as regular expressions, little effort has been expended in the efficient search of secondary structure signals. By a helix, we mean a structure defined by a combination of sequence and folding constraints. We present a flexible algorithm that searches for all approximate matches of a helix in a genome. Helices are represented by special regular expressions, that we call secondary expressions. The method is based on an alignment graph constructed from several copies of a pushdown automaton, arranged one on top of another. The worst time complexity is $O(RPN)$, where $N$ is the size of the genome, $P$ the size of the secondary expression, and $R$ its number of union symbols. We present our results of searching for specific signals of the tRNA and RNase P RNA in two genomes. This work has been done with Nadia El-Mabrouk (University of Montreal) and will be presented in RECOMB'2002 (April). A full version can be found on {\verb||}).

Virginie Collette
Last modified: Tue Feb 19 19:24:18 MET 2002