Background: To date, there have been numerous metataxonomic studies on gut microbiota (GM) profiling based on the analyses of data from public repositories. However, differences in study population and wet and dry pipelines have produced discordant results. Herein, we propose a biostatistical approach to remove these batch effects for the GM characterization in the case of autism spectrum disorders (ASDs). Methods: An original dataset of GM profiles from patients with ASD was ecologically characterized and compared with GM public digital profiles of age-matched neurotypical controls (NCs). Also, GM data from seven case–control studies on ASD were retrieved from the NCBI platform and exploited for analysis. Hence, on each dataset, conditional quantile regression (CQR) was performed to reduce the batch effects originating from both technical and geographical confounders affecting the GM-related data. This method was further applied to the whole dataset matrix, obtained by merging all datasets. The ASD GM markers were identified by the random forest (RF) model. Results: We observed a different GM profile in patients with ASD compared with NC subjects. Moreover, a significant reduction of technical- and geographical-dependent batch effects in all datasets was achieved. We identified Bacteroides_H, Faecalibacterium, Gemmiger_A_73129, Blautia_A_141781, Bifidobacterium_388775, and Phocaeicola_A_858004 as robust GM bacterial biomarkers of ASD. Finally, our validation approach provided evidence of the validity of the QCR method, showing high values of accuracy, specificity, sensitivity, and AUC-ROC. Conclusions: Herein, we proposed an updated biostatistical approach to reduce the technical and geographical batch effects that may negatively affect the description of bacterial composition in microbiota studies.
Correction of Batch Effect in Gut Microbiota Profiling of ASD Cohorts from Different Geographical Origins / Scanu, Matteo; Del Chierico, Federica; Marsiglia, Riccardo; Toto, Francesca; Guerrera, Silvia; Valeri, Giovanni; Vicari, Stefano; Putignani, Lorenza. - In: BIOMEDICINES. - ISSN 2227-9059. - (2024). [10.3390/biomedicines12102350]
Correction of Batch Effect in Gut Microbiota Profiling of ASD Cohorts from Different Geographical Origins
Matteo ScanuPrimo
Writing – Original Draft Preparation
;Riccardo MarsigliaInvestigation
;Francesca TotoInvestigation
;Giovanni ValeriSupervision
;Lorenza PutignaniUltimo
Writing – Review & Editing
2024
Abstract
Background: To date, there have been numerous metataxonomic studies on gut microbiota (GM) profiling based on the analyses of data from public repositories. However, differences in study population and wet and dry pipelines have produced discordant results. Herein, we propose a biostatistical approach to remove these batch effects for the GM characterization in the case of autism spectrum disorders (ASDs). Methods: An original dataset of GM profiles from patients with ASD was ecologically characterized and compared with GM public digital profiles of age-matched neurotypical controls (NCs). Also, GM data from seven case–control studies on ASD were retrieved from the NCBI platform and exploited for analysis. Hence, on each dataset, conditional quantile regression (CQR) was performed to reduce the batch effects originating from both technical and geographical confounders affecting the GM-related data. This method was further applied to the whole dataset matrix, obtained by merging all datasets. The ASD GM markers were identified by the random forest (RF) model. Results: We observed a different GM profile in patients with ASD compared with NC subjects. Moreover, a significant reduction of technical- and geographical-dependent batch effects in all datasets was achieved. We identified Bacteroides_H, Faecalibacterium, Gemmiger_A_73129, Blautia_A_141781, Bifidobacterium_388775, and Phocaeicola_A_858004 as robust GM bacterial biomarkers of ASD. Finally, our validation approach provided evidence of the validity of the QCR method, showing high values of accuracy, specificity, sensitivity, and AUC-ROC. Conclusions: Herein, we proposed an updated biostatistical approach to reduce the technical and geographical batch effects that may negatively affect the description of bacterial composition in microbiota studies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.