DNA typing and genetic profile data interpretation are among the most relevant topics in forensic science; among other applications, genetic profile’s capability to distinguish biogeographic information about population groups, subgroups and affiliations have been largely explored in the last decade. In fact, for investigative and intelligence purposes, it is extremely useful to identify subjects and estimate their biogeographic origins by examining the recovered DNA profiles from evidence on a crime scene. Current approaches for BiogeoGraphic Ancestry (BGA) estimation using STRs profiles are usually based on Bayesian methods, which quantify the evidence in terms of likelihood ratio, supporting or not the hypothesis that a certain profile belongs to a specific ethnic group. The present study provides an alternative approach to the likelihood ratio method that involves multivariate data analysis strategies for the estimation of multiple populations. Starting from the well-known NIST US autosomal STRs dataset involving African-American, Asian, and Caucasian individuals, and moving towards further and more geographically restricted populations (such as Northern Africans vs sub-Saharan Africans, Afghans vs Iraqis and Italians vs Romanians), powerful multivariate techniques such as Sparse and Logistic Principal Component Analysis (SL-PCA), Sparse Partial Least Squares-Discriminant Analysis (sPLS-DA) and Support Vector Machines (SVM) were employed and their discriminating power was also compared. Both sPLS-DA and SVM techniques provided robust classifications, yielding high sensitivity and specificity models capable of discriminating populations on ethnic basis. This application may represent a powerful and dynamic tool for law enforcement agencies whenever a standard autosomal STR profile is obtained from the biological evidence collected at a crime scene or recovered during mass-disaster and missing person investigations.

A multivariate statistical approach for the estimation of the ethnic origin of unknown genetic profiles in forensic genetics / Alladio, Eugenio; Della Rocca, Chiara; Barni, Filippo; Dugoujon, Jean-Michel; Garofano, Paolo; Semino, Ornella; Berti, Andrea; Novelletto, Andrea; Vincenti, Marco; Cruciani, Fulvio. - In: FORENSIC SCIENCE INTERNATIONAL: GENETICS. - ISSN 1872-4973. - 45:(2019). [10.1016/j.fsigen.2019.102209]

A multivariate statistical approach for the estimation of the ethnic origin of unknown genetic profiles in forensic genetics

Della Rocca, Chiara
Secondo
Investigation
;
Cruciani, Fulvio
Ultimo
Supervision
2019

Abstract

DNA typing and genetic profile data interpretation are among the most relevant topics in forensic science; among other applications, genetic profile’s capability to distinguish biogeographic information about population groups, subgroups and affiliations have been largely explored in the last decade. In fact, for investigative and intelligence purposes, it is extremely useful to identify subjects and estimate their biogeographic origins by examining the recovered DNA profiles from evidence on a crime scene. Current approaches for BiogeoGraphic Ancestry (BGA) estimation using STRs profiles are usually based on Bayesian methods, which quantify the evidence in terms of likelihood ratio, supporting or not the hypothesis that a certain profile belongs to a specific ethnic group. The present study provides an alternative approach to the likelihood ratio method that involves multivariate data analysis strategies for the estimation of multiple populations. Starting from the well-known NIST US autosomal STRs dataset involving African-American, Asian, and Caucasian individuals, and moving towards further and more geographically restricted populations (such as Northern Africans vs sub-Saharan Africans, Afghans vs Iraqis and Italians vs Romanians), powerful multivariate techniques such as Sparse and Logistic Principal Component Analysis (SL-PCA), Sparse Partial Least Squares-Discriminant Analysis (sPLS-DA) and Support Vector Machines (SVM) were employed and their discriminating power was also compared. Both sPLS-DA and SVM techniques provided robust classifications, yielding high sensitivity and specificity models capable of discriminating populations on ethnic basis. This application may represent a powerful and dynamic tool for law enforcement agencies whenever a standard autosomal STR profile is obtained from the biological evidence collected at a crime scene or recovered during mass-disaster and missing person investigations.
2019
Biogeographical ancestry (BGA); Ethnic origin Prediction; Multivariate data analysis; Short Tandem Repeats (STRs); Population genetics; PCA; PLS-DA; SVM
01 Pubblicazione su rivista::01a Articolo in rivista
A multivariate statistical approach for the estimation of the ethnic origin of unknown genetic profiles in forensic genetics / Alladio, Eugenio; Della Rocca, Chiara; Barni, Filippo; Dugoujon, Jean-Michel; Garofano, Paolo; Semino, Ornella; Berti, Andrea; Novelletto, Andrea; Vincenti, Marco; Cruciani, Fulvio. - In: FORENSIC SCIENCE INTERNATIONAL: GENETICS. - ISSN 1872-4973. - 45:(2019). [10.1016/j.fsigen.2019.102209]
File allegati a questo prodotto
File Dimensione Formato  
Alladio_Multivariate_2019,pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.92 MB
Formato Unknown
1.92 MB Unknown   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1335629
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 12
social impact