The discovery of long non-coding RNAs (lncRNA) has revealed additional layers of gene-expression control. Specific interactions of lncRNAs with DNA, RNAs, and RNA-binding proteins enable regulation in both cytoplasmic and nuclear compartments; e.g. a conserved triple-helix motif is essential for MALAT1 stability and oncogenic activity. Here, we present a secondary-structure-based framework to annotate and detect RNA triple helices. First, we extend the dot-bracket formalism with a third annotation line that encodes Hoogsteen contacts. Second, we introduce TripleMatcher, which searches for a triple-helix pattern, filters candidates by C1'-C1' distance thresholds, and merges overlaps into region-level zones. Using telomerase RNAs and RNA-stability elements with experimentally established triple helices (8 RNAs), TripleMatcher localized all annotated regions (structure-wise detection 8/8); geometric filtering removed most spurious candidates and improved precision (positive predictive value from 0.42 to 0.81) and overall accuracy (F$_{1}$ from 0.42 to 0.62) while maintaining sensitivity. Benchmarking eight predictors showed that pseudoknot-aware methods most reliably reproduce the local architecture required for detection, aligning secondary-structure quality with downstream triple-helix recovery. Applied prospectively, the framework identified candidate regions directly from predicted secondary structures and scaled to a screen of 4160 RNAs, where distance filtering reduced 150 990 (median per molecule: 108 [20-270]) raw candidates to 97 geometrically feasible regions across seven molecules, including human telomerase complexes. Together, the notation and TripleMatcher provide a concise route from secondary structure to a small, interpretable set of triple-helix candidates suitable for targeted experimental validation.

Decoding RNA triple helices: identification from sequence and secondary structure / Matarrese, Margherita A. G.; Quadrini, Michela; Luchetti, Nicole; Di Petta, Federico; Durante, Daniele; Ballarino, Monica; Chiodo, Letizia; Tesei, Luca. - In: BRIEFINGS IN BIOINFORMATICS. - ISSN 1467-5463. - 27:1(2026). [10.1093/bib/bbag009]

Decoding RNA triple helices: identification from sequence and secondary structure

Daniele Durante
Methodology
;
Monica Ballarino
Funding Acquisition
;
2026

Abstract

The discovery of long non-coding RNAs (lncRNA) has revealed additional layers of gene-expression control. Specific interactions of lncRNAs with DNA, RNAs, and RNA-binding proteins enable regulation in both cytoplasmic and nuclear compartments; e.g. a conserved triple-helix motif is essential for MALAT1 stability and oncogenic activity. Here, we present a secondary-structure-based framework to annotate and detect RNA triple helices. First, we extend the dot-bracket formalism with a third annotation line that encodes Hoogsteen contacts. Second, we introduce TripleMatcher, which searches for a triple-helix pattern, filters candidates by C1'-C1' distance thresholds, and merges overlaps into region-level zones. Using telomerase RNAs and RNA-stability elements with experimentally established triple helices (8 RNAs), TripleMatcher localized all annotated regions (structure-wise detection 8/8); geometric filtering removed most spurious candidates and improved precision (positive predictive value from 0.42 to 0.81) and overall accuracy (F$_{1}$ from 0.42 to 0.62) while maintaining sensitivity. Benchmarking eight predictors showed that pseudoknot-aware methods most reliably reproduce the local architecture required for detection, aligning secondary-structure quality with downstream triple-helix recovery. Applied prospectively, the framework identified candidate regions directly from predicted secondary structures and scaled to a screen of 4160 RNAs, where distance filtering reduced 150 990 (median per molecule: 108 [20-270]) raw candidates to 97 geometrically feasible regions across seven molecules, including human telomerase complexes. Together, the notation and TripleMatcher provide a concise route from secondary structure to a small, interpretable set of triple-helix candidates suitable for targeted experimental validation.
2026
non-coding RNA; long non-coding RNA; RNA pattern search; RNA secondary structure; RNA structure prediction
01 Pubblicazione su rivista::01a Articolo in rivista
Decoding RNA triple helices: identification from sequence and secondary structure / Matarrese, Margherita A. G.; Quadrini, Michela; Luchetti, Nicole; Di Petta, Federico; Durante, Daniele; Ballarino, Monica; Chiodo, Letizia; Tesei, Luca. - In: BRIEFINGS IN BIOINFORMATICS. - ISSN 1467-5463. - 27:1(2026). [10.1093/bib/bbag009]
File allegati a questo prodotto
File Dimensione Formato  
Matarrese_Decoding_2026.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 2.02 MB
Formato Adobe PDF
2.02 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1758791
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact