Ph.D
Group : Bioinformatics
Algorithmique de l'alignement structure-séquence d'ARN: une approche générale et paramétr
Starts on 01/10/2009
Advisor : DENISE, Alain
Funding : CDD sur contrat UPS
Affiliation : Université Paris-Saclay
Laboratory : LRI BIO-INFO
Defended on 05/12/2012, committee :
Rapporteurs:
Jean-Claude König, Professeur,Université de Montpellier, France
Stéphane Vialette, Directeur de recherche, Université de Marne-la-vallée, France
Examinateurs:
Yannis Manoussakis, Professeur, Université Paris-Sud, France
Alessandra Carbone, Professeur, Université Pierre et Marie Curie, France
Directeurs de thèse:
Alain Denise, Professeur, Université Paris-Sud, France
Dominique Barth, Professeur, Université Versailles-Saint-Quentin
Research activities :
Abstract :
Non-coding RNA macromolecules are involved in the metabolism of all living beings. From the computational point of view, their two major biological problems are: the prediction of their structure to better understand their functions and their detection in databases or genomes. The RNA structure-sequence alignment addresses these two issues. The RNA structure-sequence aligment is to align a known structure of a first RNA with the sequence of a second RNA. The structure is represented as an arc-annotated sequence and the sequence represents the RNA nucleotide sequence. To solve this problem, we want to optimize the alignment according to a cost function. So this is an optimization problem, which is NP-hard. Accordingly, different works define several reduced structure classes for which they propose specific algorithms but with polynomial complexity.
The presented work unifies and generalizes all previous approaches by building a unique non-specific class algorithm with parametrized complexity. This algorithm uses a technique from graph theory: the decomposition tree, that is to say, it transforms the given structure into a tree-decomposition and then I will explain how to align this decomposition with the sequence. I will then highlight why the implementation of this approach requires a reformulation of the problem as well as a substantial modification to the conventional use of dynamic programming for tree decompositions. This leads to a parameterized algorithm whose parameter is entirely related to the tree-decomposition.
The construction of tree decomposition for which alignment is the most effective is unfortunately also a NP-hard problem. However, I will outline a heuristic decompositions construction adapted to RNA structures and show that the complexity of the approach (solving the problem in its generality) equals or outperforms all previous approaches in their respective structure classes. I will finish by presenting new structure classes which extend existing ones without degrating the complexity of the alignment but which can represent the majority of known structures containing many important elements not previously taken into account (such as RNA tertiary motifs).