Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION).

SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins / Ahmad, S.; Charoenkwan, P.; Quinn, J. M. W.; Moni, M. A.; Hasan, M. M.; Lio, P.; Shoombuatong, W.. - In: SCIENTIFIC REPORTS. - ISSN 2045-2322. - 12:1(2022). [10.1038/s41598-022-08173-5]

SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins

Lio P.;
2022

Abstract

Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION).
2022
Support Vector Machine; Amino Acid; Biological Database
01 Pubblicazione su rivista::01a Articolo in rivista
SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins / Ahmad, S.; Charoenkwan, P.; Quinn, J. M. W.; Moni, M. A.; Hasan, M. M.; Lio, P.; Shoombuatong, W.. - In: SCIENTIFIC REPORTS. - ISSN 2045-2322. - 12:1(2022). [10.1038/s41598-022-08173-5]
File allegati a questo prodotto
File Dimensione Formato  
Ahmad_SCORPION_2022.pdf

accesso aperto

Note: https://www.nature.com/articles/s41598-022-08173-5
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 6.7 MB
Formato Adobe PDF
6.7 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1723933
Citazioni
  • ???jsp.display-item.citation.pmc??? 13
  • Scopus 28
  • ???jsp.display-item.citation.isi??? 25
social impact