We investigate the integration of semantic analysis techniques and machine learning models, to identify and predict academic success patterns for high school students. Starting from an existing dataset, we generated summary notes for teachers based on key academic indicators, and transformed them into embeddings using a lightweight transformer model (DistilBERT). Principal component analysis was applied to reduce the dimensionality of the embeddings, to be then used for K-Means clustering. A decision tree classifier was trained to predict student success, leveraging both classical features (such as grades, non-attendance, and failures) and semantic embeddings. The results show the great potential of combining structured and unstructured data for early detection of students at-risk.

Prediction of High School Study Outcome through Clustering and Embedding / Addiucci, Luca; Temperini, Marco. - 1799:(2026), pp. 117-128. ( ICORE2026 Lille - France ) [10.1007/978-3-032-15743-0_10].

Prediction of High School Study Outcome through Clustering and Embedding

Addiucci Luca
Primo
;
Temperini Marco
Secondo
2026

Abstract

We investigate the integration of semantic analysis techniques and machine learning models, to identify and predict academic success patterns for high school students. Starting from an existing dataset, we generated summary notes for teachers based on key academic indicators, and transformed them into embeddings using a lightweight transformer model (DistilBERT). Principal component analysis was applied to reduce the dimensionality of the embeddings, to be then used for K-Means clustering. A decision tree classifier was trained to predict student success, leveraging both classical features (such as grades, non-attendance, and failures) and semantic embeddings. The results show the great potential of combining structured and unstructured data for early detection of students at-risk.
2026
ICORE2026
educational data mining; semantic embeddings; clustering
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Prediction of High School Study Outcome through Clustering and Embedding / Addiucci, Luca; Temperini, Marco. - 1799:(2026), pp. 117-128. ( ICORE2026 Lille - France ) [10.1007/978-3-032-15743-0_10].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1759676
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact