Authorship attribution is a fascinating field at the crossroad between linguistics and information science. Its relevance goes much beyond the specific predictions that different tools can make about authors whose identity is uncertain or hidden behind known “noms de plume”. Correctly spotting the unknown author of a text is far from reflecting a “keyhole” attitude, representing instead the tip of an iceberg whose main body is made of solid tools and algorithms able to extract syntactic, possibly semantic, information out of generic strings of characters. Here we follow a data- compression approach to authorship attribution through which we define a notion of similarity between generic strings of characters (in particular literary texts). We start by assessing the overall performance of our set of tools in performing authorship attribution both on the wide corpus adopted in this volume and on an extended corpus. We then concentrate on the well-known “affaire Ferrante” (originally treated by some of us back in 20061), confirming and strengthening our original claim that, within the corpus considered, Domenico Starnone is the most likely author behind Elena Ferrante. We stress again that, despite the strong hints pointing to Starnone, we cannot rule out the possibility that Ferrante’s signature could hide another author (or several authors) not included in the corpus. Specific analyses are still in order to shed light on this last point.
Data-compression approach to authorship attribution / Lalli, Margherita; Tria, Francesca; Loreto, Vittorio. - (2018). (Intervento presentato al convegno Drawing Elena Ferrante's Profile tenutosi a Padova).
Data-compression approach to authorship attribution
Francesca Tria;Vittorio Loreto
2018
Abstract
Authorship attribution is a fascinating field at the crossroad between linguistics and information science. Its relevance goes much beyond the specific predictions that different tools can make about authors whose identity is uncertain or hidden behind known “noms de plume”. Correctly spotting the unknown author of a text is far from reflecting a “keyhole” attitude, representing instead the tip of an iceberg whose main body is made of solid tools and algorithms able to extract syntactic, possibly semantic, information out of generic strings of characters. Here we follow a data- compression approach to authorship attribution through which we define a notion of similarity between generic strings of characters (in particular literary texts). We start by assessing the overall performance of our set of tools in performing authorship attribution both on the wide corpus adopted in this volume and on an extended corpus. We then concentrate on the well-known “affaire Ferrante” (originally treated by some of us back in 20061), confirming and strengthening our original claim that, within the corpus considered, Domenico Starnone is the most likely author behind Elena Ferrante. We stress again that, despite the strong hints pointing to Starnone, we cannot rule out the possibility that Ferrante’s signature could hide another author (or several authors) not included in the corpus. Specific analyses are still in order to shed light on this last point.File | Dimensione | Formato | |
---|---|---|---|
Lalli_Data-compression_2018.pdf
solo gestori archivio
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
731.63 kB
Formato
Adobe PDF
|
731.63 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.