Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients

Kuo, Michael D.; Chiu, Keith W. H.; Wang, David S.; Larici, Anna Rita; Poplavskiy, Dmytro; Valentini, Adele; Napoli, Alessandro; Borghesi, Andrea; Ligabue, Guido; Fang, Xin Hao B.; Wong, Hing Ki C.; Zhang, Sailong; Hunter, John R.; Mousa, Abeer; Infante, Amato; Elia, Lorenzo; Golemi, Salvatore; Leung Ho P., Yu; Hui, Christopher K. M.; Erickson, Bradley J.

doi:10.1007/S00330-022-08969-Z

Objectives: While chest radiograph (CXR) is the first-line imaging investigation in patients with respiratory symptoms, differentiating COVID-19 from other respiratory infections on CXR remains challenging. We developed and validated an AI system for COVID-19 detection on presenting CXR. Methods: A deep learning model (RadGenX), trained on 168,850 CXRs, was validated on a large international test set of presenting CXRs of symptomatic patients from 9 study sites (US, Italy, and Hong Kong SAR) and 2 public datasets from the US and Europe. Performance was measured by area under the receiver operator characteristic curve (AUC). Bootstrapped simulations were performed to assess performance across a range of potential COVID-19 disease prevalence values (3.33 to 33.3%). Comparison against international radiologists was performed on an independent test set of 852 cases. Results: RadGenX achieved an AUC of 0.89 on 4-fold cross-validation and an AUC of 0.79 (95%CI 0.78-0.80) on an independent test cohort of 5,894 patients. Delong's test showed statistical differences in model performance across patients from different regions (p < 0.01), disease severity (p < 0.001), gender (p < 0.001), and age (p = 0.03). Prevalence simulations showed the negative predictive value increases from 86.1% at 33.3% prevalence, to greater than 98.5% at any prevalence below 4.5%. Compared with radiologists, McNemar's test showed the model has higher sensitivity (p < 0.001) but lower specificity (p < 0.001). Conclusion: An AI model that predicts COVID-19 infection on CXR in symptomatic patients was validated on a large international cohort providing valuable context on testing and performance expectations for AI systems that perform COVID-19 prediction on CXR. Key points: • An AI model developed using CXRs to detect COVID-19 was validated in a large multi-center cohort of 5,894 patients from 9 prospectively recruited sites and 2 public datasets. • Differences in AI model performance were seen across region, disease severity, gender, and age. • Prevalence simulations on the international test set demonstrate the model's NPV is greater than 98.5% at any prevalence below 4.5%.

Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients / Kuo, M.D., Chiu, K.W.H., Wang, D.S., Larici, A.R., Poplavskiy, D., Valentini, A., Napoli, A., Borghesi, A., Ligabue, G., Fang, X.H.B., Wong, H.K.C., Zhang, S., Hunter, J.R., Mousa, A., Infante, A., Elia, L., Golemi, S., Yu, L.H.P., Hui, C.K.M., Erickson, B.J.. - In: EUROPEAN RADIOLOGY. - ISSN 1563-4086. - 33:1(2023), pp. 23-33. [10.1007/S00330-022-08969-Z]

Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients

Chiu, Keith W. H.^Secondo;Wang, David S.;Larici, Anna Rita;Poplavskiy, Dmytro;Valentini, Adele;Napoli, Alessandro;Borghesi, Andrea;Ligabue, Guido;Fang, Xin Hao B.;Wong, Hing Ki C.;Zhang, Sailong;Hunter, John R.;Mousa, Abeer;Infante, Amato;Elia, Lorenzo;Golemi, Salvatore;Yu, Leung Ho P.;Hui, Christopher K. M.^Penultimo;Erickson, Bradley J.^Ultimo

2023

Abstract

Objectives: While chest radiograph (CXR) is the first-line imaging investigation in patients with respiratory symptoms, differentiating COVID-19 from other respiratory infections on CXR remains challenging. We developed and validated an AI system for COVID-19 detection on presenting CXR. Methods: A deep learning model (RadGenX), trained on 168,850 CXRs, was validated on a large international test set of presenting CXRs of symptomatic patients from 9 study sites (US, Italy, and Hong Kong SAR) and 2 public datasets from the US and Europe. Performance was measured by area under the receiver operator characteristic curve (AUC). Bootstrapped simulations were performed to assess performance across a range of potential COVID-19 disease prevalence values (3.33 to 33.3%). Comparison against international radiologists was performed on an independent test set of 852 cases. Results: RadGenX achieved an AUC of 0.89 on 4-fold cross-validation and an AUC of 0.79 (95%CI 0.78-0.80) on an independent test cohort of 5,894 patients. Delong's test showed statistical differences in model performance across patients from different regions (p < 0.01), disease severity (p < 0.001), gender (p < 0.001), and age (p = 0.03). Prevalence simulations showed the negative predictive value increases from 86.1% at 33.3% prevalence, to greater than 98.5% at any prevalence below 4.5%. Compared with radiologists, McNemar's test showed the model has higher sensitivity (p < 0.001) but lower specificity (p < 0.001). Conclusion: An AI model that predicts COVID-19 infection on CXR in symptomatic patients was validated on a large international cohort providing valuable context on testing and performance expectations for AI systems that perform COVID-19 prediction on CXR. Key points: • An AI model developed using CXRs to detect COVID-19 was validated in a large multi-center cohort of 5,894 patients from 9 prospectively recruited sites and 2 public datasets. • Differences in AI model performance were seen across region, disease severity, gender, and age. • Prevalence simulations on the international test set demonstrate the model's NPV is greater than 98.5% at any prevalence below 4.5%.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Parole chiave
	
				artificial intelligence; COVID-19; public health; radiology; thoracic
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients / Kuo, M.D., Chiu, K.W.H., Wang, D.S., Larici, A.R., Poplavskiy, D., Valentini, A., Napoli, A., Borghesi, A., Ligabue, G., Fang, X.H.B., Wong, H.K.C., Zhang, S., Hunter, J.R., Mousa, A., Infante, A., Elia, L., Golemi, S., Yu, L.H.P., Hui, C.K.M., Erickson, B.J.. - In: EUROPEAN RADIOLOGY. - ISSN 1563-4086. - 33:1(2023), pp. 23-33. [10.1007/S00330-022-08969-Z]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Kuo_Multi-center validation_2023.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.56 MB Formato Adobe PDF Contatta l'autore	2.56 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1683630

Citazioni

7

11

11

Catalogo dei prodotti della ricerca