Catalogo dei prodotti della ricerca

Objectives: Deep learning (DL)-driven artificial intelligence (AI) image classification models hold potential as important referral adjuncts for healthcare providers not trained in oral mucosal diseases. The aim of this multicenter pilot-study was to compare the performance of different AI models in the classification of oral mucosal lesions. Methods: Retrospective photographic images from 4 different oral medicine centers (Sapienza University of Rome, University of Maryland Baltimore, UT Health San Antonio, Johns Hopkins University) were used to train/validate a DL model (BEiT-BERT Pre-Training of Image Transformers) to classify clinical images from 5 different categories: normal mucosa (NM), reactive/benign (RB), Immuno-mediated (IM), potentially-malignant (PM), oral cancer (OC). The model was based on pre-trained 87M-parameters trained on a ImageNet-21k-dataset and fine-tuned on ImageNet 2012-dataset at 224x224 resolution. The BEiT model was then compared to 6 machine learning benchmark models (kNN 1.0, kNN 2.0, Logistic Regression 1.0, Logistic Regression 2.0, Random Forest 1.0, Random Forest 2.0), with Google Inception V3 or SqueezeNet visual embedding DL methods. Sensitivity/Recall, Specificity, Precision, and F1-score over the classes, were evaluated as performance metrics. Results: The dataset included 1608 clinical images from 824 patients, divided into two sub-datasets, respectively, for training (80%; n=1157, training; n=129, validation) and testing (20%, N=322). Comparing to the 6 benchmark models, BEiT model achieved the best scores when assessing for overall accuracy. The BEiT model demonstrated an overall accuracy of 79.2% over the five classes, with a mean accuracy of each class of 76.5%. Overall, the highest accuracy between the classes was achieved by the IM group with 84% and followed by RB (82%) and PM (78%), whereas the highest specificity was reached by the cancer group (97%) and followed by NM (96%) and PM (94%). The class with the highest sensitivity was the IM (84%) whereas the class with the highest precision was the PM (89%). Overall, the BEiT model outperformed the other six off-the-shelf models in most of the performance metrics. Conclusions: In this pilot-study, the custom developed DL-model (BEiT) had a high performance on IM, PM and OC classes, outperformed six different off-the-shelf tools and may hold promise for future real-world AI applicability that will assist referring providers not specifically trained in oral mucosal diseases.

Deep-learning based image classification of oral mucosal lesions. A multicenter pilot study evaluating real-world applicability / Fantozzi, Paolo Junior. - (2026 Jan 21).

Deep-learning based image classification of oral mucosal lesions. A multicenter pilot study evaluating real-world applicability

FANTOZZI, Paolo Junior

21/01/2026

Abstract

Objectives: Deep learning (DL)-driven artificial intelligence (AI) image classification models hold potential as important referral adjuncts for healthcare providers not trained in oral mucosal diseases. The aim of this multicenter pilot-study was to compare the performance of different AI models in the classification of oral mucosal lesions. Methods: Retrospective photographic images from 4 different oral medicine centers (Sapienza University of Rome, University of Maryland Baltimore, UT Health San Antonio, Johns Hopkins University) were used to train/validate a DL model (BEiT-BERT Pre-Training of Image Transformers) to classify clinical images from 5 different categories: normal mucosa (NM), reactive/benign (RB), Immuno-mediated (IM), potentially-malignant (PM), oral cancer (OC). The model was based on pre-trained 87M-parameters trained on a ImageNet-21k-dataset and fine-tuned on ImageNet 2012-dataset at 224x224 resolution. The BEiT model was then compared to 6 machine learning benchmark models (kNN 1.0, kNN 2.0, Logistic Regression 1.0, Logistic Regression 2.0, Random Forest 1.0, Random Forest 2.0), with Google Inception V3 or SqueezeNet visual embedding DL methods. Sensitivity/Recall, Specificity, Precision, and F1-score over the classes, were evaluated as performance metrics. Results: The dataset included 1608 clinical images from 824 patients, divided into two sub-datasets, respectively, for training (80%; n=1157, training; n=129, validation) and testing (20%, N=322). Comparing to the 6 benchmark models, BEiT model achieved the best scores when assessing for overall accuracy. The BEiT model demonstrated an overall accuracy of 79.2% over the five classes, with a mean accuracy of each class of 76.5%. Overall, the highest accuracy between the classes was achieved by the IM group with 84% and followed by RB (82%) and PM (78%), whereas the highest specificity was reached by the cancer group (97%) and followed by NM (96%) and PM (94%). The class with the highest sensitivity was the IM (84%) whereas the class with the highest precision was the PM (89%). Overall, the BEiT model outperformed the other six off-the-shelf models in most of the performance metrics. Conclusions: In this pilot-study, the custom developed DL-model (BEiT) had a high performance on IM, PM and OC classes, outperformed six different off-the-shelf tools and may hold promise for future real-world AI applicability that will assist referring providers not specifically trained in oral mucosal diseases.

Scheda breve

Scheda completa

Data di discussione

21-gen-2026

Appartiene alla tipologia:

07a Tesi di Dottorato

File allegati a questo prodotto

File	Dimensione	Formato
Tesi_dottorato_Fantozzi.pdf accesso aperto Note: Tesi Dottorato Fantozzi PJ Tipologia: Tesi di dottorato Licenza: Creative commons Dimensione 2.57 MB Formato Adobe PDF	2.57 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1766862

Citazioni

ND

ND

ND

social impact