An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure
Yann Ponty
13 July 2011, 14:00 - 13 July 2011, 15:30 Salle/Bat : 455/PCRI-N
Contact :
Activités de recherche :
Résumé :
Collaboration avec : Jerome Waldispühl, McGill, Montreal, Canada
The analysis of the relationship between sequences and structures (i.e. how mutations affect structures and reciprocally how structures influence mutations) is essential to decipher the principles driving molecular evolution, to infer the origins of genetic diseases or to develop bioengineering applications such as the design of artificial molecules. Because their structures can be predicted from the sequence data only, RNA molecules provide a good framework to study this sequence-structure relationship.
We recently introduced a suite of algorithms called RNAMutants which allows, for the first time, a complete exploration of RNA
sequence-structure maps in polynomial time and space. Formally, RNAMutants takes an input sequence (or seed) to compute the Boltzmann weighted ensembles of mutants with exactly k mutations, and sample mutations from these ensembles.
However, this approach suffers from major limitations. Indeed, since the Boltzmann probabilities of the mutations depend of the free energy of the structures, RNAMutants has difficulties to sample mutant sequences with low GC-contents.
In this talk presented at RECOMB'11, we introduce a novel unbiased adaptive sampling algorithm that enables RNAMutants to sample regions of the
mutational landscape poorly covered by classical algorithms. We applied these methods to sample mutations with low GC-contents. These adaptive sampling techniques can be easily adapted to explore other regions of the sequence and structural landscapes which are difficult to sample. Importantly, these algorithms come at a minimal computational cost.
We demonstrate the insights offered by these techniques on studies of complete RNA sequence structures maps of sizes up to 40 nucleotides. Our results indicate that the GC-content has a strong influence on the size and shape of the evolutionary accessible sequence and structural spaces. In particular, we show that low GC-contents favor the apparition of internal loops and thus possibly the synthesis of tertiary structure motifs. On the other hand, high GC-contents significantly reduce the size of the evolutionary accessible mutational landscapes.