The impact of Transformer-based language models has been unprecedented in Natural Language Processing (NLP). The success of such models has also led to their adoption in other fields including bioinformatics. Taking this into account, this paper discusses recent advances in Transformer-based models for protein sequence analysis and design. In this review, we have discussed and analysed a significant number of works pertaining to such applications. These applications encompass gene ontology, functional and structural protein identification, generation of de novo proteins and binding of proteins. We attempt to shed light on the strength and weaknesses of the discussed works to provide a comprehensive insight to readers. Finally, we highlight shortcomings in existing research and explore potential avenues for future developments. We believe that this review will help researchers working in this field to have an overall idea of the state of the art in this field, and to orient their future studies.

A Comprehensive Review of Transformer-based language models for Protein Sequence Analysis and Design / Ghosh, Nimisha; Santoni, Daniele; Nawn, Debaleena; Ottaviani, Eleonora; Felici, Giovanni. - In: IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE. - ISSN 2691-4581. - (2025).

A Comprehensive Review of Transformer-based language models for Protein Sequence Analysis and Design

Daniele Santoni;Giovanni Felici
2025

Abstract

The impact of Transformer-based language models has been unprecedented in Natural Language Processing (NLP). The success of such models has also led to their adoption in other fields including bioinformatics. Taking this into account, this paper discusses recent advances in Transformer-based models for protein sequence analysis and design. In this review, we have discussed and analysed a significant number of works pertaining to such applications. These applications encompass gene ontology, functional and structural protein identification, generation of de novo proteins and binding of proteins. We attempt to shed light on the strength and weaknesses of the discussed works to provide a comprehensive insight to readers. Finally, we highlight shortcomings in existing research and explore potential avenues for future developments. We believe that this review will help researchers working in this field to have an overall idea of the state of the art in this field, and to orient their future studies.
2025
Computer Science - Learning; Computer Science - Learning; Computer Science - Artificial Intelligence; Quantitative Biology - Quantitative Methods
01 Pubblicazione su rivista::01a Articolo in rivista
A Comprehensive Review of Transformer-based language models for Protein Sequence Analysis and Design / Ghosh, Nimisha; Santoni, Daniele; Nawn, Debaleena; Ottaviani, Eleonora; Felici, Giovanni. - In: IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE. - ISSN 2691-4581. - (2025).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1750998
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact