Generative probabilistic models emerge as a new paradigm in data-driven, evolution-informed design of biomolecular sequences. This paper introduces a novel approach, called Edge Activation Direct Coupling Analysis (eaDCA), tailored to the characteristics of RNA sequences, with a strong emphasis on simplicity, efficiency, and interpretability. eaDCA explicitly constructs sparse coevolutionary models for RNA families, achieving performance levels comparable to more complex methods while utilizing a significantly lower number of parameters. Our approach demonstrates efficiency in generating artificial RNA sequences that closely resemble their natural counterparts in both statistical analyses and SHAPE-MaP experiments, and in predicting the effect of mutations. Notably, eaDCA provides a unique feature: estimating the number of potential functional sequences within a given RNA family. For example, in the case of cyclic di-AMP riboswitches (RF00379), our analysis suggests the existence of approximately 1039 functional nucleotide sequences. While huge compared to the known <4000 natural sequences, this number represents only a tiny fraction of the vast pool of nearly 1082 possible nucleotide sequences of the same length (136 nucleotides). These results underscore the promise of sparse and interpretable generative models, such as eaDCA, in enhancing our understanding of the expansive RNA sequence space.

Towards parsimonious generative modeling of RNA families / Calvanese, Francesco; N Lambert, Camille; Nghe, Philippe; Zamponi, Francesco; Weigt, Martin. - In: NUCLEIC ACIDS RESEARCH. - ISSN 0305-1048. - (2024), pp. 1-13. [10.1093/nar/gkae289]

Towards parsimonious generative modeling of RNA families

Francesco Zamponi
;
Martin Weigt
2024

Abstract

Generative probabilistic models emerge as a new paradigm in data-driven, evolution-informed design of biomolecular sequences. This paper introduces a novel approach, called Edge Activation Direct Coupling Analysis (eaDCA), tailored to the characteristics of RNA sequences, with a strong emphasis on simplicity, efficiency, and interpretability. eaDCA explicitly constructs sparse coevolutionary models for RNA families, achieving performance levels comparable to more complex methods while utilizing a significantly lower number of parameters. Our approach demonstrates efficiency in generating artificial RNA sequences that closely resemble their natural counterparts in both statistical analyses and SHAPE-MaP experiments, and in predicting the effect of mutations. Notably, eaDCA provides a unique feature: estimating the number of potential functional sequences within a given RNA family. For example, in the case of cyclic di-AMP riboswitches (RF00379), our analysis suggests the existence of approximately 1039 functional nucleotide sequences. While huge compared to the known <4000 natural sequences, this number represents only a tiny fraction of the vast pool of nearly 1082 possible nucleotide sequences of the same length (136 nucleotides). These results underscore the promise of sparse and interpretable generative models, such as eaDCA, in enhancing our understanding of the expansive RNA sequence space.
2024
generative models; direct coupling analysis; bioinformatics
01 Pubblicazione su rivista::01a Articolo in rivista
Towards parsimonious generative modeling of RNA families / Calvanese, Francesco; N Lambert, Camille; Nghe, Philippe; Zamponi, Francesco; Weigt, Martin. - In: NUCLEIC ACIDS RESEARCH. - ISSN 0305-1048. - (2024), pp. 1-13. [10.1093/nar/gkae289]
File allegati a questo prodotto
File Dimensione Formato  
Calvanese_Towards_2024.pdf

accesso aperto

Note: Articolo su rivista
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 5.97 MB
Formato Adobe PDF
5.97 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1710664
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact