In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification.

Language trees and zipping / Benedetto, Dario; Caglioti, Emanuele; Loreto, Vittorio. - In: PHYSICAL REVIEW LETTERS. - ISSN 0031-9007. - 88:4(2002), pp. 048702:1-048702:4. [10.1103/physrevlett.88.048702]

Language trees and zipping

BENEDETTO, Dario;CAGLIOTI, Emanuele;LORETO, Vittorio
2002

Abstract

In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification.
2002
entropy; language recognition; authorship attribution; language classification
01 Pubblicazione su rivista::01a Articolo in rivista
Language trees and zipping / Benedetto, Dario; Caglioti, Emanuele; Loreto, Vittorio. - In: PHYSICAL REVIEW LETTERS. - ISSN 0031-9007. - 88:4(2002), pp. 048702:1-048702:4. [10.1103/physrevlett.88.048702]
File allegati a questo prodotto
File Dimensione Formato  
Benedetto_Language Trees and Zipping_2002.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 102.68 kB
Formato Adobe PDF
102.68 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/251284
Citazioni
  • ???jsp.display-item.citation.pmc??? 13
  • Scopus 259
  • ???jsp.display-item.citation.isi??? 192
social impact