http://www.unistra.fr/images/logo-uds-signature.gif

Homepage of Christian Michel

THEORETICAL BIOINFORMATICS

Responsable Prof. Christian MICHEL

Bioinformatique Théorique

CSTB, ICube

Université de Strasbourg, CNRS

300 Boulevard Sébastien Brant

67400 Illkirch, France

Site: http://dpt-info.u-strasbg.fr/~c.michel/

Site équipe: http://icube-cstb.unistra.fr/fr/index.php/Accueil


 

News: THESE EN INFORMATIQUE

DATE LIMITE DE CANDIDATURE: 2 Juin 2017

Titre: Codes circulaires dans les gènes: étude combinatoire, statistique et applications bioinformatiques

Résumé:

L'objectif de cette thèse est de poursuivre la théorie des codes circulaires dans les gènes selon trois orientations possibles: combinatoire, statistique et applications bioinformatiques.

L'approche combinatoire consistera à développer l'étude des codes circulaires par la théorie des graphes (Fimmel, Michel, Strüngmann, 2016, Philosophical Transactions of the Royal Society A 374, 20150058): énumération des codes par preuve mathématique ou par algorithme massivement parallélisé, plus longs chemins dans les graphes acycliques de code circulaire, identification de nouvelles propriétés dans les graphes de code circulaire, etc.

L'approche statistique s'intéressera aux motifs du code circulaire identifié dans les gènes (El Soufi, Michel, 2014, Computational Biology and Chemistry 52, 9-17), en particulier la recherche algorithmique de motifs du code circulaire dans les centres du décodage du ribosome d'espèces différentes.

Les applications bioinformatiques porteront en particulier sur l'introduction de propriétés des codes circulaires aux méthodes bioinformatiques classiques d'alignement ou d'inférence phylogénétique.

Directeur: Christian Michel

Conditions d'admissibilité:

- Classement dans les 5 premiers en M2 Informatique ou écoles d'ingénieurs

- Formation en informatique théorique et/ou mathématiques appliquées

Si vous êtes intéressé par ce sujet de recherche, merci de me contacter avec un dossier de candidature comportant:

- un CV

- les notes disponibles du M2 Informatique ou école d'ingénieurs et votre classement (avec le rang/effectif)

- les notes du M1 Informatique ou école d'ingénieurs et votre classement (avec le rang/effectif)

- une ou plusieurs lettres de recommandation.

Un dossier incomplet ne sera pas considéré par l'Ecole Doctorale.

Le site: http://dpt-info.u-strasbg.fr/~michel/ vous présente le contexte scientifique.

 

 

Circular codes in genes [PDF] [PDF] [PDF]

Circular code motifs and translation code in genes [PDF] [PDF] [PDF] [PDF] [PDF] [PDF]

Origin of circular codes in genes

An hypothesis [PDF], a first answer [PDF], a second answer [PDF]

but still some combinatorial problems open...

Another hypothesis: primitive genes based on circular code, e.g. [PDF]

MEMBERS

BIOINFORMATICS RESEARCH FIELDS

CIRCULAR CODE AND GENETIC CODE

GENE EVOLUTION

RESEARCH SOFTWARES

PUBLICATIONS

LECTURES

PERSONAL DATA


MEMBERS

Permanent member

Christian MICHEL, Professor

PhD student

Karim EL SOUFI

Collaborations

GDR "Bioinformatique Moléculaire"

GDR "Informatique Mathématique" in the research group "Combinatoire des mots, algorithmique du texte et du génome"

Prof. Elena Fimmel, Hochschule Mannheim, Institut für Angewandte Mathematik, Mannheim, Allemagne

Dr Sophie Lèbre, Institut Montpelliérain Alexander Grothendieck (IMAG), Equipe Probabilités et Statistique, Université de Montpellier 3

Prof. Giuseppe Pirillo, Consiglio Nazionale delle Ricerche, Dipartimento di MatematicaU.Dini”, Florence, Italie

Dr Hervé Seligmann, National Natural History Museum Collections, The Hebrew University of Jerusalem, Jerusalem, Israël

Prof. Lutz Strüngmann, Hochschule Mannheim, Institut für Angewandte Mathematik, Mannheim, Allemagne

Top page

THEORETICAL BIOINFORMATICS RESEARCH FIELDS

The objectives of the Theoretical Bioinformatics group are placed on the level of fundamental and theoretical knowledge with the identification of rules and properties in genes.

Review: Article A38

Identification of statistical signals in genes: Articles A1, A3-6, A7, A8, A11, A14, A16

Identification of circular codes in genes: Articles A19, A21, A22, A30, A33, A61, A67

Properties of circular codes in genes: Articles A36, A41, A46, A49, A53, A59, A63-66

Combinatorics of circular codes: Articles A27, A39, A40, A47, A50, A52, A54, A55, A57, A58, A60, A70-71

Computer models of gene evolution: Articles A8-10, A12, A20

Probabilistics models of gene evolution by substitution: Articles A13, A15, A17, A23, A24, A31, A32, A34, A35, A37, A42, A43, A45, A51

Probabilistics models of gene evolution by substitution, insertion and délétion: Articles A48, A56, A62, A65, A68

Phylogenetic distances and inference methods: Articles A35, A37, A44

Research software in bioinformatics: Articles A9, A28, A45, A65, A68

Top page

CIRCULAR CODE AND GENETIC CODE

The circular code theory proposes that genes are constituted of two trinucleotide codes: the amino acid code and the circular code.It relies on two main results: the identification of a maximal C3 self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses (Michel 2015 J. Theor. Biol. 380, 156-177. (doi:10.1016/j.jtbi.2015.04.009); Arquès and Michel 1996 J. Theor. Biol. 182, 45-58. (doi:10.1006/jtbi.1996.0142)) and the finding of X circular code motifs in tRNAs and rRNAs, in particular in the ribosome decoding center (Michel 2012 Comput. Biol. Chem. 37, 24-37. (doi:10.1016/j.compbiolchem.2011.10.002); El Soufi and Michel 2014 Comput. Biol. Chem. 52, 9-17. (doi:10.1016/j.compbiolchem.2014.08.001)). The univerally conserved nucleotides A1492 and A1493 and the conserved nucleotide G530 are included in X circular code motifs.

The classical amino acid code contains 64 trinucleotides {AAA,...,TTT} with 61 trinucleotides coding the 20 amino acids and three stop codons which do not code for amino acid. The amino acid code in today’s genes do not use all 64 available trinucleotides but a subset of 61 trinucleotides for coding the 20 amino acids. It is a surjective code.

The circular code X identified in genes of bacteria, eukaryotes, plasmids and viruses is based on 20 trinucleotides {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} with two mathematical properties involved in translation. It codes 12 amino acids according to the standard amino acid code. Thus, it is also a surjective code. Furthermore, by definition of the circular code, it allows the reading frame to be retrieved, maintained and synchronized at any position in a gene.

Reviews in english [PDF] and [PDF]

Review in french [PDF]

Gonzalez, Giannerini and Rosa ("Circular codes revisited: A statistical approach", J. Theor. Biol, 2011, 275, 21-28):

« In 1996 Arquès and Michel [...] discovered the existence of a common circular code in eukaryote and prokaryote genomes. Since then, circular code theory has provoked great interest and underwent a rapid development. »

« The results [obtenus par les auteurs dans leur article] indicate that, on average, the code proposed by Arquès and Michel has the best covering capability ... »

Gladstone ("Autocorrelation genetic syntax of eukaryotic protein-coding sequences", 2013):

« Michel has theorized that two codes, the genetic code and the circular code, are used together as key components of the functioning of the ribosomal complex. He has proposed that while the genetic code conveys what amino acids to recruit to the ribosomal complex during translation, the circular code is used for frame identification and synchronization of the ribosomal complex with the ORF. Evidence has been provided that shows circular codes most likely play a role in ribosome synchronization with the ORF (Frey and Michel 2006). A recent analysis of frameshift genes found in eukaryotes and prokaryotes has found a significant correlation between frameshift signals and Michel’s proposed circular code (Ahmed, Frey et al. 2007). »

« … and our understanding of the role these circular codes play in vivo is largely a mystery. »

Fimmel and Strüngmann ("Codon distribution in error-detecting circular codes", Life, 2016, 6, 14):

« In 1957, Francis Crick et al. suggested an ingenious explanation for the process of frame maintenance. The idea was based on the notion of comma-free codes. Although Crick’s hypothesis proved to be wrong, in 1996, Arquès and Michel discovered the existence of a weaker version of such codes in eukaryote and prokaryote genomes, namely the so-called circular codes. Since then, circular code theory has invariably evoked great interest and made significant progress. »

« In 2015, by quantifying the approach used in 1996 and by applying massive statistical analysis of gene taxonomic groups, the circular code detected in 1996 was rediscovered extensively in genes of prokaryotes and eukaryotes and now also identified in the genes of plasmids and viruses (Michel, 2015). The codes discovered by Arquès and Michel in nature have even more interesting properties [par rapport aux codes comma-free]. With each codon, its anticodon is also in the code (self-complementarity), and they also have the error detection property in frame 1 and 2 (C3-property). »

Top page

MOTIFS OF THE CIRCULAR CODE X (X MOTIFS) IN THE RIBOSOME DECODING CENTER

Motifs of the circular code X in the ribosome decoding center: X motifs of mRNA in green, X motif containing the universally conserved A1492 and A1492 of rRNA in purple, X motif containing the universally converved G530 of rRNA in fuchsia and X motifs of tRNAs in dark blue (anticodon in black) (Michel, 2012; El Soufi and Michel, 2014). Graphical representation here with the 16s rRNA of Thermus thermophilus (PDB 3I8G).

Top page

GENE EVOLUTION

Models of gene evolution by substitution of genetic motifs (Benard, Michel) [PDF]

Models of gene evolution by substitution, insertion and deletion of nucleotides (Lèbre, Michel) [PDF]

Models of gene evolution by substitution, insertion and deletion of genetic motifs (Benard, Lèbre, Michel) [PDF]

Top page

RESEARCH SOFTWARES

GETEC (Genome Evolution by Transformation, Expansion and Contraction) (Benard E., Lèbre S., Michel C.J., 2015) to determine evolutionary analytical solutions of genetic motifs based on substitution, insertion and deletion as a function of time or sequence length, as well as in direct time direction (past-present) or in inverse time direction (present-past)

http://icube-bioinfo.u-strasbg.fr/webMathematica/GETEC/

DNAdistree (Criscuolo A., Michel C.J., 2009) to infer phylogenetic trees according to distance methods based on weighted phylogenetic distances

http://icube-bioinfo.u-strasbg.fr/DNADISTREE/

Top page

THEORETICAL BIOINFORMATICS ARTICLES IN INTERNATIONAL JOURNALS

2017

[A73] El Soufi K., Michel C.J. (2017). Unitary circular code motifs in genomes of eukaryotes. Biosystems in press. [PDF]

2016

[A72] El Soufi K., Michel C.J. (2016). Circular code motifs in genomes of eukaryotes. Journal of Theoretical Biology 408, 198-212. [PDF]

[A71] Fimmel E., Michel C.J., Strüngmann L. (2016). n-Nucleotide circular codes in graph theory. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374, 20150058. [PDF]

[A70] Michel C.J., Pellegrini M., Pirillo G. (2016). Maximal dinucleotide and trinucleotide circular codes. Journal of Theoretical Biology 389, 40-46. [PDF]

2015

[A69] El Soufi K., Michel C.J. (2015). Circular code motifs near the ribosome decoding center. Computational Biology and Chemistry 59, 158-176. [PDF]

[A68] Benard E., Lèbre S., Michel C.J. (2015). Genome evolution by transformation, expansion and contraction (GETEC). Biosystems 135, 15-34. [PDF]

[A67] Michel C.J. (2015). The maximal C3 self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses. Journal of Theoretical Biology 380, 156-177. [PDF]

[A66] Michel C.J. (2015). An extended genetic scale of reading frame coding. Journal of Theoretical Biology 365, 164-174. [PDF]

2014

[A65] El Soufi K., Michel C.J. (2014). Circular code motifs in the ribosome decoding center. Computational Biology and Chemistry 52, 9-17. [PDF]

[A64] Michel C.J. (2014). A genetic scale of reading frame coding. Journal of Theoretical Biology 355, 83-94. [PDF]

[A63] Michel C.J., Seligmann H. (2014). Bijective transformation circular codes and nucleotide exchanging RNA transcription. Biosystems 118, 39-50. [PDF]

2013

[A62] Lèbre S., Michel C.J. (2013). A new molecular evolution model for limited insertion independent of substitution. Mathematical Biosciences 245, 137-147. [PDF]

[A61] Herrmann M., Michel C.J., Zugmeyer B. (2013). A necklace algorithm to determine the growth function of trinucleotide circular codes. Journal of Applied Mathematics and Bioinformatics 3, 1-40. [PDF]

[A60] Benard E., Michel C.J. (2013). Transition and transversion on the common trinucleotide circular code. Computational Biology Journal 2013, Article ID 795418, 1-10. [PDF]

[A59] Michel C.J. (2013). Circular code motifs in transfer RNAs. Computational Biology and Chemistry 45, 17-29. [PDF]

[A58] Michel C.J., Pirillo G. (2013). Dinucleotide circular codes. ISRN Biomathematics 2013, Article ID 538631, 1-8. [PDF]

[A57] Michel C.J., Pirillo G. (2013). A permuted set of a trinucleotide circular code coding the 20 amino acids in variant nuclear codes. Journal of Theoretical Biology 319, 116-121. [PDF]

2012

[A56] Lèbre S., Michel C.J. (2012). An evolution model for sequence length based on residue insertion-deletion independent of substitution: an application to the GC content in bacterial genomes. Bulletin of Mathematical Biology 74, 1764-1788. [PDF]

[A55] Michel C.J., Pirillo G., Pirillo M.A. (2012). A classification of 20-trinucleotide circular codes. Information and Computation 212, 55-63. [PDF]

[A54] Bussoli L., Michel C.J., Pirillo G. (2012). On conjugation partitions of sets of trinucleotides. Applied Mathematics 3, 107-112. [PDF]

[A53] Michel C.J. (2012). Circular code motifs in transfer and 16S ribosomal RNAs: a possible translation code in genes. Computational Biology and Chemistry 37, 24-37. [PDF]

2011

[A52] Bussoli L., Michel C.J., Pirillo G. (2011). On some forbidden configurations for self-complementary trinucleotide circular codes. Journal for Algebra and Number Theory Academia 2, 223-232. [PDF]

[A51] Benard E., Michel C.J. (2011). A generalization of substitution evolution models of nucleotides to genetic motifs. Journal of Theoretical Biology 288, 73-83. [PDF]

[A50] Michel C.J., Pirillo G. (2011). Strong trinucleotide circular codes. International Journal of Combinatorics 2011, Article ID 659567, 1-14. [PDF]

[A49] Ahmed A., Michel C.J. (2011). Circular code signal in frameshift genes. Journal of Computer Science and Systems Biology 4, 7-15. [PDF]

2010

[A48] Lèbre S., Michel C.J. (2010). A stochastic evolution model for residue insertion-deletion independent from substitution. Computational Biology and Chemistry 34, 259-267. [PDF]

[A47] Michel C.J., Pirillo G. (2010). Identification of all trinucleotide circular codes. Computational Biology and Chemistry 34, 122-125. [PDF]

[A46] Ahmed A., Frey G., Michel C.J. (2010). Essential molecular functions associated with the circular code evolution. Journal of Theoretical Biology 264, 613-622. [PDF]

2009

[A45] Benard E., Michel C.J. (2009). Computation of direct and inverse mutations with the SEGM web server (Stochastic Evolution of Genetic Motifs): an application to splice sites of human genome introns. Computational Biology and Chemistry 33, 245-252. [PDF]

[A44] Criscuolo A., Michel C.J. (2009). Phylogenetic inference with weighted codon evolutionary distances. Journal of Molecular Evolution 68, 377-392. [PDF]

[A43] Bahi J.M., Michel C.J. (2009). A stochastic model of gene evolution with time dependent pseudochaotic mutations. Bulletin of Mathematical Biology 71, 681-700. [PDF]

2008

[A42] Bahi J.M., Michel C.J. (2008). A stochastic model of gene evolution with chaotic mutations. Journal of Theoretical Biology 255, 53-63. [PDF]

[A41] Ahmed A., Michel C.J. (2008). Plant microRNA detection using the circular code information. Computational Biology and Chemistry 32, 400-405. [PDF]

[A40] Michel C.J., Pirillo G, Pirillo M.A. (2008). A relation between trinucleotide comma-free codes and trinucleotide circular codes. Theoretical Computer Science 401, 17-26. [PDF]

[A39] Michel C.J., Pirillo G, Pirillo M.A. (2008). Varieties of comma free codes. Computer and Mathematics with Applications 55, 989-996. [PDF]

[A38] Michel C.J. (2008). A 2006 review of circular codes in genes. Computer and Mathematics with Applications 55, 984-988. [PDF]

2007

[A37] Michel C.J. (2007). Evolution probabilities and phylogenetic distance of dinucleotides. Journal of Theoretical Biology 249, 271-277. [PDF]

[A36] Ahmed A., Frey G., Michel C.J. (2007). Frameshift signals in genes associated with the circular code. In Silico Biology 7, 155-168. [PDF]

[A35] Michel C.J. (2007). Codon phylogenetic distance. Computational Biology and Chemistry 31, 36-43. [PDF]

[A34] Michel C.J. (2007). An analytical model of gene evolution with 9 mutation parameters: an application to the amino acids coded by the common circular code. Bulletin of Mathematical Biology 69, 677-698. [PDF]

2006

[A33] Frey G., Michel C.J. (2006). Identification of circular codes in bacterial genomes and their use in a factorization method for retrieving the reading frames of genes. Computational Biology and Chemistry 30, 87-101. [PDF]

[A32] Frey G., Michel C.J. (2006). An analytical model of gene evolution with 6 mutation parameters: an application to archaeal circular codes. Computational Biology and Chemistry 30, 1-11. [PDF]

2004

[A31] Bahi J.M., Michel C.J. (2004). A stochastic gene evolution model with time dependent mutations. Bulletin of Mathematical Biology 66, 763-778. [PDF]

2003

[A30] Frey G., Michel C.J. (2003). Circular codes in archaeal genomes. Journal of Theoretical Biology 223, 413-431. [PDF]

[A29] Michel C.J. (2003). A computer method for identifying patterns in the electroencephalogram signals. Journal of Medical Engineering and Technology 27, 267-275. [PDF]

2002

[A28] Arquès D.G., Lacan J., Michel C.J. (2002). Identification of protein coding genes in genomes with statistical functions based on the circular code. Biosystems 66, 73-92. [PDF]

2001

[A27] Lacan J., Michel C.J. (2001). Analysis of a circular code model. Journal of Theoretical Biology 213, 159-170. [PDF]

2000

[A26] Bahi J.M., Michel C.J. (2000). Convergence of discrete asynchronous iterations. International Journal of Computer Mathematics 74, 113-125. [PDF]

1999

[A25] Bahi J.M., Michel C.J. (1999). Simulations of asynchronous evolution of discrete systems. Simulation Practice and Theory 7, 309-324. [PDF]

[A24] Arquès D.G., Fallot J.-P., Marsan L., Michel C.J. (1999). An evolutionary analytical model of a complementary circular code. Biosystems 49, 83-103. [PDF]

1998

[A23] Arquès D.G., Fallot J.-P., Michel C.J. (1998). An evolutionary analytical model of a complementary circular code simulating the protein coding genes, the 5' and 3' regions. Bulletin of Mathematical Biology 60, 163-194. [PDF]

1997

[A22] Arquès D.G., Michel C.J. (1997). A circular code in the protein coding genes of mitochondria. Journal of Theoretical Biology 189, 273-290. [PDF]

[A21] Arquès D.G., Michel C.J. (1997). A code in the protein coding genes. Biosystems 44, 107-134. [PDF]

[A20] Arquès D.G., Fallot J.-P., Michel C.J. (1997). An evolutionary model of a complementary circular code. Journal of Theoretical Biology 185, 241-253. [PDF]

1996

[A19] Arquès D.G., Michel C.J. (1996). A complementary circular code in the protein coding genes. Journal of Theoretical Biology 182, 45-58. [PDF]

[A18] Arquès D.G., Fallot J.-P., Michel C.J. (1996). Identification of several types of periodicities in the collagens and their simulation. International Journal of Biological Macromolecules 19, 131-138. [PDF]

1995

[A17] Arquès D.G., Michel C.J. (1995). Analytical solutions of the dinucleotide probability after and before random mutations. Journal of Theoretical Biology 175, 533-544. [PDF]

[A16] Arquès D.G., Lapayre J.-C., Michel C.J. (1995). Identification and simulation of shifted periodicities common to protein coding genes of eukaryotes, prokaryotes and viruses. Journal of Theoretical Biology 172, 279-291. [PDF]

1994

[A15] Arquès D.G., Michel C.J. (1994). Analytical expression of the purine/pyrimidine autocorrelation function after and before random mutations. Mathematical Biosciences 123, 103-125. [PDF]

1993

[A14] Arquès D.G., Michel C.J. (1993). Identification and simulation of new non-random statistical properties common to different eukaryotic gene subpopulations. Biochimie 75, 399-407. [PDF]

[A13] Arquès D.G., Michel C.J. (1993). Analytical expression of the purine/pyrimidine codon probability after and before random mutations. Bulletin of Mathematical Biology 55, 1025-1038. [PDF]

[A12] Arquès D.G., Michel C.J. (1993). A model of gene evolution based on recognizable languages and on insertion and deletion operations. International Journal of Modelling and Simulation 13, 110-113. [PDF]

[A11] Arquès D.G., Michel C.J., Orieux K. (1993). Identification and simulation of new non-random statistical properties common to different populations of eukaryotic non-coding genes. Journal of Theoretical Biology 161, 329-342. [PDF]

1992

[A10] Arquès D.G., Michel C.J. (1992). A simulation of the genetic periodicities modulo 2 and 3 with processes of nucleotide insertions and deletions. Journal of Theoretical Biology 156, 113-127. [PDF]

[A9] Arquès D.G., Michel C.J., Orieux K. (1992). Analysis of Gene Evolution: the software AGE. Bioinformatics 8, 5-14. [PDF]

1990

[A8] Arquès D.G., Michel C.J. (1990). A model of DNA sequence evolution. Part 1: Statistical features and classification of gene populations, 743-753. Part 2: Simulation model, 753-766. Part 3: Return of the model to the reality, 766-770. Bulletin of Mathematical Biology 52, 741-772. [PDF]

[A7] Arquès D.G., Michel C.J. (1990). Periodicities in coding and noncoding regions of the genes. Journal of Theoretical Biology 143, 307-318. [PDF]

1989

[A6] Michel C.J. (1989). A study of the purine/pyrimidine codon occurrence with a reduced centered variable and an evaluation compared to the frequency statistic. Mathematical Biosciences 97, 161-177. [PDF]

1987

[A5] Arquès D.G., Michel C.J. (1987). Periodicities in introns. Nucleic Acids Research 15, 7581-7592. [PDF]

[A4] Arquès D.G., Michel C.J. (1987). A purine-pyrimidine motif verifying an identical presence in almost all gene taxonomic groups. Journal of Theoretical Biology 128, 457-461. [PDF]

[A3] Arquès D.G., Michel C.J. (1987). Study of a perturbation in the coding periodicity. Mathematical Biosciences 86, 1-14. [PDF]

1986

[A2] Michel C.J., Jacq B., Arquès D.G., Bickle T.A. (1986). A remarkable amino acid sequence homology between a phage T4 tail fibre protein and ORF314 of phage lambda located in the tail operon. Gene 44, 147-150. [PDF]

[A1] Michel C.J. (1986). New statistical approach to discriminate between protein coding and non-coding regions in DNA sequences and its evaluation. Journal of Theoretical Biology 120, 223-236. [PDF]

Top page