We investigate the integration of semantic analysis techniques and machine learning models, to identify and predict academic success patterns for high school students. Starting from an existing dataset, we generated summary notes for teachers based on key academic indicators, and transformed them into embeddings using a lightweight transformer model (DistilBERT). Principal component analysis was applied to reduce the dimensionality of the embeddings, to be then used for K-Means clustering. A decision tree classifier was trained to predict student success, leveraging both classical features (such as grades, non-attendance, and failures) and semantic embeddings. The results show the great potential of combining structured and unstructured data for early detection of students at-risk.
Prediction of High School Study Outcome through Clustering and Embedding / Addiucci, Luca; Temperini, Marco. - 1799:(2026), pp. 117-128. ( ICORE2026 Lille - France ) [10.1007/978-3-032-15743-0_10].
Prediction of High School Study Outcome through Clustering and Embedding
Addiucci Luca
Primo
;Temperini MarcoSecondo
2026
Abstract
We investigate the integration of semantic analysis techniques and machine learning models, to identify and predict academic success patterns for high school students. Starting from an existing dataset, we generated summary notes for teachers based on key academic indicators, and transformed them into embeddings using a lightweight transformer model (DistilBERT). Principal component analysis was applied to reduce the dimensionality of the embeddings, to be then used for K-Means clustering. A decision tree classifier was trained to predict student success, leveraging both classical features (such as grades, non-attendance, and failures) and semantic embeddings. The results show the great potential of combining structured and unstructured data for early detection of students at-risk.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


