In this paper we introduce a computational procedure for measuring the semantic readability of a segmented text. The procedure mainly consists of three steps. First, natural language processing tools and unsupervised machine learning techniques are adopted in order to obtain a vectorized numerical representation for any section or segment of the inputted text. Hence, similar or semantically related text segments are modeled by nearby points in a vector space, then the shortest and longest Hamiltonian paths passing through them are computed. Lastly, the lengths of these paths and that of the original ordering on the segments are combined into an arithmetic expression in order to derive an index, which may be used to gauge the semantic difficulty that a reader is supposed to experience when reading the text. A preliminary experimental study is conducted on seven classic narrative texts written in English, which were obtained from the well-known Gutenberg project. The experimental results appear to be in line with our expectations.

A Computational Measure for the Semantic Readability of Segmented Texts / Santucci, Valentino; Bartoccini, Umberto; Mengoni, Paolo; Zanda, Fabio. - 13377:(2022), pp. 107-119. ( International Conference on Computational Science and its Applications (ICCSA 2022) Malaga ) [10.1007/978-3-031-10536-4_8].

A Computational Measure for the Semantic Readability of Segmented Texts

Zanda, Fabio
2022

Abstract

In this paper we introduce a computational procedure for measuring the semantic readability of a segmented text. The procedure mainly consists of three steps. First, natural language processing tools and unsupervised machine learning techniques are adopted in order to obtain a vectorized numerical representation for any section or segment of the inputted text. Hence, similar or semantically related text segments are modeled by nearby points in a vector space, then the shortest and longest Hamiltonian paths passing through them are computed. Lastly, the lengths of these paths and that of the original ordering on the segments are combined into an arithmetic expression in order to derive an index, which may be used to gauge the semantic difficulty that a reader is supposed to experience when reading the text. A preliminary experimental study is conducted on seven classic narrative texts written in English, which were obtained from the well-known Gutenberg project. The experimental results appear to be in line with our expectations.
2022
International Conference on Computational Science and its Applications (ICCSA 2022)
Semantic readability of texts; Natural Language Processing; Unsupervised machine learning; Hamiltonian path
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
A Computational Measure for the Semantic Readability of Segmented Texts / Santucci, Valentino; Bartoccini, Umberto; Mengoni, Paolo; Zanda, Fabio. - 13377:(2022), pp. 107-119. ( International Conference on Computational Science and its Applications (ICCSA 2022) Malaga ) [10.1007/978-3-031-10536-4_8].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1758265
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact