In this work we investigate the effectiveness of different text mining methods for the task of automated identification of interdisciplinary doctoral dissertations, considering solely the content of their abstracts. In contrast to previous attempts, we frame the interdisciplinarity detection as a two step classification process: we first predict the main discipline of the dissertation using a supervised multi-class classifier and then exploit the distribution of prediction confidences of the first classifier as input for the binary classification of interdisciplinarity. For both supervised classification models we experiment with several different sets of features ranging from standard lexical features such as TF-IDF weighted vectors over topic modelling distributions to latent semantic textual representations known as word embeddings. In contrast to previous findings, our experimental results suggest that interdisciplinarity is better detected when directly using textual features than when inferring from the results of main discipline classification.

Capturing interdisciplinarity in academic abstracts / Nanni, F.; Dietz, L.; Faralli, S.; Glavas, G.; Ponzetto, S. P.. - In: D-LIB MAGAZINE. - ISSN 1082-9873. - 22:9-10(2016). [10.1045/september2016-nanni]

Capturing interdisciplinarity in academic abstracts

Faralli S.
Co-primo
;
Ponzetto S. P.
Co-primo
2016

Abstract

In this work we investigate the effectiveness of different text mining methods for the task of automated identification of interdisciplinary doctoral dissertations, considering solely the content of their abstracts. In contrast to previous attempts, we frame the interdisciplinarity detection as a two step classification process: we first predict the main discipline of the dissertation using a supervised multi-class classifier and then exploit the distribution of prediction confidences of the first classifier as input for the binary classification of interdisciplinarity. For both supervised classification models we experiment with several different sets of features ranging from standard lexical features such as TF-IDF weighted vectors over topic modelling distributions to latent semantic textual representations known as word embeddings. In contrast to previous findings, our experimental results suggest that interdisciplinarity is better detected when directly using textual features than when inferring from the results of main discipline classification.
2016
Interdisciplinarity; Scientometrics; Text Classification; Tool Criticism
01 Pubblicazione su rivista::01a Articolo in rivista
Capturing interdisciplinarity in academic abstracts / Nanni, F.; Dietz, L.; Faralli, S.; Glavas, G.; Ponzetto, S. P.. - In: D-LIB MAGAZINE. - ISSN 1082-9873. - 22:9-10(2016). [10.1045/september2016-nanni]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1620667
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact