Background: Advances in artificial intelligence (AI) have triggered interest in using intelligent systems to improve prenatal detection of fetal congenital heart defects (CHDs). Our aim is to systematically examine the current literature on diagnostic performance of AI-enabled prenatal cardiac ultrasound. Methods: This systematic review and meta-analysis was registered with PROSPERO (CRD42024549601). Embase, Medline, Cochrane Central Database of Controlled Trials, and CINAHL were searched from inception until February 2025. Studies evaluating AI performance in prenatal detection of fetal CHDs were eligible for inclusion, and studies focusing on the application of AI before 16 weeks of gestation, or using three- or four-dimensional ultrasound, were excluded. Pooled sensitivity and specificity were obtained using random-effect method, and pooled proportions using the Freeman-Tukey arcsine square root transformation. Heterogeneity was assessed with I2 statistics. Risk of bias and adherence to reporting standards were assessed using QUADAS-2 and TRIPOD+AI, respectively. Risk of publication bias was assessed with Deek's test and certainty of evidence for outcomes with GRADE approach. Findings: Fifteen studies were included, of which fourteen developed and evaluated a model and one externally evaluated a previously trained model. Images and videos obtained during cardiac screening or fetal echocardiography of 30.121 fetuses were used for training, validation and testing. For the binary task of classifying heart as normal or abnormal, AI models achieved a pooled sensitivity of 0.89 (95% CI 0.83–0.93, I2 = 77.92%) and specificity of 0.91 (95% CI 0.84–0.95, I2 = 77.92%). The subgroup analysis showed that models tested on various CHDs exhibited lower sensitivity compared to those tested for a specific cardiac abnormality (0.85; 95% CI 0.75–0.91 vs 0.92; 95% CI 0.87–0.96), while specificity remained comparable (0.90; 95% CI 0.79–0.96 vs 0.91; 95% CI 0.81–0.97). Overall, AI models performed better than operators with lower expertise and were nearly comparable to experts; however, the human comparator group (median six clinicians, IQR 3–10) was usually small and non-blinded. Relevant sources of heterogeneity were the types of cardiac views collected, the prevalence of CHDs across different datasets, and the types of CHDs examined. The risk of bias was moderate-high and adherence to reporting standards low (>70% in 18/51 TRIPOD+AI items). The risk of publication bias was not statistically significant (Deek's test p = 0.474). Interpretation: These findings suggest that AI models perform better than clinicians with lower expertise, but this must be interpreted with caution due to the high risk of bias and sources of heterogeneity. Funding: This study was partly supported by the InnoHK-funded Hong Kong Centre for Cerebro-cardiovascular Health Engineering (COCHE) Project 2.1 (Cardiovascular risks in early life and fetal echocardiography). ATP and JAN are supported by the National Institute for Health and Care Research (NIHR) Oxford Biomedical Research Centre (BRC).

Artificial intelligence-enabled prenatal ultrasound for the detection of fetal cardiac abnormalities: a systematic review and meta-analysis / D'Alberti, Elena; Patey, Olga; Smith, Carolyn; Šalović, Bojana; Hernandez-Cruz, Netzahualcoyotl; Noble, J. Alison; Papageorghiou, Aris T.. - In: ECLINICALMEDICINE. - ISSN 2589-5370. - 84:(2025). [10.1016/j.eclinm.2025.103250]

Artificial intelligence-enabled prenatal ultrasound for the detection of fetal cardiac abnormalities: a systematic review and meta-analysis

D'Alberti, Elena;
2025

Abstract

Background: Advances in artificial intelligence (AI) have triggered interest in using intelligent systems to improve prenatal detection of fetal congenital heart defects (CHDs). Our aim is to systematically examine the current literature on diagnostic performance of AI-enabled prenatal cardiac ultrasound. Methods: This systematic review and meta-analysis was registered with PROSPERO (CRD42024549601). Embase, Medline, Cochrane Central Database of Controlled Trials, and CINAHL were searched from inception until February 2025. Studies evaluating AI performance in prenatal detection of fetal CHDs were eligible for inclusion, and studies focusing on the application of AI before 16 weeks of gestation, or using three- or four-dimensional ultrasound, were excluded. Pooled sensitivity and specificity were obtained using random-effect method, and pooled proportions using the Freeman-Tukey arcsine square root transformation. Heterogeneity was assessed with I2 statistics. Risk of bias and adherence to reporting standards were assessed using QUADAS-2 and TRIPOD+AI, respectively. Risk of publication bias was assessed with Deek's test and certainty of evidence for outcomes with GRADE approach. Findings: Fifteen studies were included, of which fourteen developed and evaluated a model and one externally evaluated a previously trained model. Images and videos obtained during cardiac screening or fetal echocardiography of 30.121 fetuses were used for training, validation and testing. For the binary task of classifying heart as normal or abnormal, AI models achieved a pooled sensitivity of 0.89 (95% CI 0.83–0.93, I2 = 77.92%) and specificity of 0.91 (95% CI 0.84–0.95, I2 = 77.92%). The subgroup analysis showed that models tested on various CHDs exhibited lower sensitivity compared to those tested for a specific cardiac abnormality (0.85; 95% CI 0.75–0.91 vs 0.92; 95% CI 0.87–0.96), while specificity remained comparable (0.90; 95% CI 0.79–0.96 vs 0.91; 95% CI 0.81–0.97). Overall, AI models performed better than operators with lower expertise and were nearly comparable to experts; however, the human comparator group (median six clinicians, IQR 3–10) was usually small and non-blinded. Relevant sources of heterogeneity were the types of cardiac views collected, the prevalence of CHDs across different datasets, and the types of CHDs examined. The risk of bias was moderate-high and adherence to reporting standards low (>70% in 18/51 TRIPOD+AI items). The risk of publication bias was not statistically significant (Deek's test p = 0.474). Interpretation: These findings suggest that AI models perform better than clinicians with lower expertise, but this must be interpreted with caution due to the high risk of bias and sources of heterogeneity. Funding: This study was partly supported by the InnoHK-funded Hong Kong Centre for Cerebro-cardiovascular Health Engineering (COCHE) Project 2.1 (Cardiovascular risks in early life and fetal echocardiography). ATP and JAN are supported by the National Institute for Health and Care Research (NIHR) Oxford Biomedical Research Centre (BRC).
2025
Artificial intelligence; Congenital heart defect; Diagnostic accuracy; Echocardiography; Fetal ultrasound
01 Pubblicazione su rivista::01a Articolo in rivista
Artificial intelligence-enabled prenatal ultrasound for the detection of fetal cardiac abnormalities: a systematic review and meta-analysis / D'Alberti, Elena; Patey, Olga; Smith, Carolyn; Šalović, Bojana; Hernandez-Cruz, Netzahualcoyotl; Noble, J. Alison; Papageorghiou, Aris T.. - In: ECLINICALMEDICINE. - ISSN 2589-5370. - 84:(2025). [10.1016/j.eclinm.2025.103250]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1749159
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact