Background: aortic stenosis is a common heart valve disease that mainly affects older people in developed countries. Its early detection is crucial to prevent the irreversible disease progression and, eventually, death. A typical screening technique to detect stenosis uses echocardiograms; however, variations introduced by other tissues, camera movements, and uneven lighting can hamper the visual inspection, leading to misdiagnosis. To address these issues, effective solutions involve employing deep learning algorithms to assist clinicians in detecting and classifying stenosis by developing models that can predict this pathology from single heart views. Although promising, the visual information conveyed by a single image may not be sufficient for an accurate diagnosis, especially when using an automatic system; thus, this indicates that different solutions should be explored. Methodology: following this rationale, this paper proposes a novel deep learning architecture, composed of a multi-view, multi-scale feature extractor, and a transformer encoder (MV-MS-FETE) to predict stenosis from parasternal long and short-axis views. In particular, starting from the latter, the designed model extracts relevant features at multiple scales along its feature extractor component and takes advantage of a transformer encoder to perform the final classification. Results: experiments were performed on the recently released Tufts medical echocardiogram public dataset, which comprises 27,788 images split into training, validation, and test sets. Due to the recent release of this collection, tests were also conducted on several state-of-the-art models to create multi-view and single-view benchmarks. For all models, standard classification metrics were computed (e.g., precision, F1-score). The obtained results show that the proposed approach outperforms other multi-view methods in terms of accuracy and F1-score and has more stable performance throughout the training procedure. Furthermore, the experiments also highlight that multi-view methods generally perform better than their single-view counterparts. Conclusion: this paper introduces a novel multi-view and multi-scale model for aortic stenosis recognition, as well as three benchmarks to evaluate it, effectively providing multi-view and single-view comparisons that fully highlight the model's effectiveness in aiding clinicians in performing diagnoses while also producing several baselines for the aortic stenosis recognition task.
MV-MS-FETE: Multi-view multi-scale feature extractor and transformer encoder for stenosis recognition in echocardiograms / Avola, D.; Cannistraci, I.; Cascio, M.; Cinque, L.; Fagioli, A.; Foresti, G. L.; Rodola, E.; Solito, L.. - In: COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE. - ISSN 0169-2607. - 245:(2024), pp. 1-8. [10.1016/j.cmpb.2024.108037]
MV-MS-FETE: Multi-view multi-scale feature extractor and transformer encoder for stenosis recognition in echocardiograms
Avola D.Primo
;Cannistraci I.;Cinque L.;Rodola E.;
2024
Abstract
Background: aortic stenosis is a common heart valve disease that mainly affects older people in developed countries. Its early detection is crucial to prevent the irreversible disease progression and, eventually, death. A typical screening technique to detect stenosis uses echocardiograms; however, variations introduced by other tissues, camera movements, and uneven lighting can hamper the visual inspection, leading to misdiagnosis. To address these issues, effective solutions involve employing deep learning algorithms to assist clinicians in detecting and classifying stenosis by developing models that can predict this pathology from single heart views. Although promising, the visual information conveyed by a single image may not be sufficient for an accurate diagnosis, especially when using an automatic system; thus, this indicates that different solutions should be explored. Methodology: following this rationale, this paper proposes a novel deep learning architecture, composed of a multi-view, multi-scale feature extractor, and a transformer encoder (MV-MS-FETE) to predict stenosis from parasternal long and short-axis views. In particular, starting from the latter, the designed model extracts relevant features at multiple scales along its feature extractor component and takes advantage of a transformer encoder to perform the final classification. Results: experiments were performed on the recently released Tufts medical echocardiogram public dataset, which comprises 27,788 images split into training, validation, and test sets. Due to the recent release of this collection, tests were also conducted on several state-of-the-art models to create multi-view and single-view benchmarks. For all models, standard classification metrics were computed (e.g., precision, F1-score). The obtained results show that the proposed approach outperforms other multi-view methods in terms of accuracy and F1-score and has more stable performance throughout the training procedure. Furthermore, the experiments also highlight that multi-view methods generally perform better than their single-view counterparts. Conclusion: this paper introduces a novel multi-view and multi-scale model for aortic stenosis recognition, as well as three benchmarks to evaluate it, effectively providing multi-view and single-view comparisons that fully highlight the model's effectiveness in aiding clinicians in performing diagnoses while also producing several baselines for the aortic stenosis recognition task.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.