Inverse protein folding is challenging due to its inherent one-to-many mapping characteristic, where numerous possible amino acid sequences can fold into a single, identical protein backbone. This task involves not only identifying viable sequences but also representing the sheer diversity of potential solutions. However, existing discriminative models, such as transformer-based auto-regressive models, struggle to encapsulate the diverse range of plausible solutions. In contrast, diffusion probabilistic models, as an emerging genre of generative approaches, offer the potential to generate a diverse set of sequence candidates for determined protein backbones. We propose a novel graph denoising diffusion model for inverse protein folding, where a given protein backbone guides the diffusion process on the corresponding amino acid residue types. The model infers the joint distribution of amino acids conditioned on the nodes' physiochemical properties and local environment. Moreover, we utilize amino acid replacement matrices for the diffusion forward process, encoding the biologically meaningful prior knowledge of amino acids from their spatial and sequential neighbors as well as themselves, which reduces the sampling space of the generative process. Our model achieves state-of-the-art performance over a set of popular baseline methods in sequence recovery and exhibits great potential in generating diverse protein sequences for a determined protein backbone structure. The code is available on https://github.com/ykiiiiii/GraDe_IF.

Graph Denoising Diffusion for Inverse Protein Folding / Yi, K.; Zhou, B.; Shen, Y.; Lio, P.; Wang, Y. G.. - 36:(2023). (Intervento presentato al convegno 37th Conference on Neural Information Processing Systems, NeurIPS 2023 tenutosi a Ernest N. Morial Convention Center, usa).

Graph Denoising Diffusion for Inverse Protein Folding

Lio P.;
2023

Abstract

Inverse protein folding is challenging due to its inherent one-to-many mapping characteristic, where numerous possible amino acid sequences can fold into a single, identical protein backbone. This task involves not only identifying viable sequences but also representing the sheer diversity of potential solutions. However, existing discriminative models, such as transformer-based auto-regressive models, struggle to encapsulate the diverse range of plausible solutions. In contrast, diffusion probabilistic models, as an emerging genre of generative approaches, offer the potential to generate a diverse set of sequence candidates for determined protein backbones. We propose a novel graph denoising diffusion model for inverse protein folding, where a given protein backbone guides the diffusion process on the corresponding amino acid residue types. The model infers the joint distribution of amino acids conditioned on the nodes' physiochemical properties and local environment. Moreover, we utilize amino acid replacement matrices for the diffusion forward process, encoding the biologically meaningful prior knowledge of amino acids from their spatial and sequential neighbors as well as themselves, which reduces the sampling space of the generative process. Our model achieves state-of-the-art performance over a set of popular baseline methods in sequence recovery and exhibits great potential in generating diverse protein sequences for a determined protein backbone structure. The code is available on https://github.com/ykiiiiii/GraDe_IF.
2023
37th Conference on Neural Information Processing Systems, NeurIPS 2023
Diffusion; Inverse problems; Protein folding; Time series analysis
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Graph Denoising Diffusion for Inverse Protein Folding / Yi, K.; Zhou, B.; Shen, Y.; Lio, P.; Wang, Y. G.. - 36:(2023). (Intervento presentato al convegno 37th Conference on Neural Information Processing Systems, NeurIPS 2023 tenutosi a Ernest N. Morial Convention Center, usa).
File allegati a questo prodotto
File Dimensione Formato  
Yi_Graph_2023.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 5.08 MB
Formato Adobe PDF
5.08 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1725270
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 0
social impact