Internet can be misused by cyber criminals as a plat- form to conduct illegitimate activities (such as harassment, cyber bullying, and incitement of hate or violence) anonymously. As a result, authorship analysis of anonymous texts in Internet (such as emails, forum comments) has attracted significant attention in the digital forensic and text mining communities. The main problem is a large number of possible of authors, which hinders the effective identification of a true author. We interpret open class author attribution as a process of expert recommendation where the decision support system returns a list of suspected authors for further analysis by forensics experts rather than a single prediction result, thus reducing the scale of the problem. We describe the task formally and present algorithms for constructing the suspected author list. For evaluation we propose using a simple Winner-Takes-All (WTA) metric as well as a set of gain-discount model based metrics from the information retrieval domain (mean reciprocal rank, discounted cumulative gain and rank-biased precision). We also propose the List Precision (LP) metric as an extension of WTA for evaluating the usability of the suspected author list. For experiments, we use our own dataset of Internet comments in Lithuanian language and consider the use of language-specific (Lithuanian) lexical features together with general lexical features derived from English language. For classification we use one-class Support Vector Machine (SVM) classifier. The results of experiments show that the usability of open class author attribution can be improved considerably by using a set of language-specific lexical features together with general lexical features, while the proposed method can be used to reduce the number of suspected authors thus alleviating the work of forensic linguists

Open class authorship attribution of lithuanian internet comments using one-class classifier / Venčkauskas, Algimantas; Karpavičius, Arnas; Damaševičius, Robertas; Marcinkevičius, Romas; Kapočiūtė-Dzikienė, Jurgita; Napoli, Christian. - (2017), pp. 373-382. (Intervento presentato al convegno 2017 Federated Conference on Computer Science and Information Systems (FedCSIS 2017) tenutosi a Prague; Czech Republic) [10.15439/2017F461].

Open class authorship attribution of lithuanian internet comments using one-class classifier

Christian Napoli
2017

Abstract

Internet can be misused by cyber criminals as a plat- form to conduct illegitimate activities (such as harassment, cyber bullying, and incitement of hate or violence) anonymously. As a result, authorship analysis of anonymous texts in Internet (such as emails, forum comments) has attracted significant attention in the digital forensic and text mining communities. The main problem is a large number of possible of authors, which hinders the effective identification of a true author. We interpret open class author attribution as a process of expert recommendation where the decision support system returns a list of suspected authors for further analysis by forensics experts rather than a single prediction result, thus reducing the scale of the problem. We describe the task formally and present algorithms for constructing the suspected author list. For evaluation we propose using a simple Winner-Takes-All (WTA) metric as well as a set of gain-discount model based metrics from the information retrieval domain (mean reciprocal rank, discounted cumulative gain and rank-biased precision). We also propose the List Precision (LP) metric as an extension of WTA for evaluating the usability of the suspected author list. For experiments, we use our own dataset of Internet comments in Lithuanian language and consider the use of language-specific (Lithuanian) lexical features together with general lexical features derived from English language. For classification we use one-class Support Vector Machine (SVM) classifier. The results of experiments show that the usability of open class author attribution can be improved considerably by using a set of language-specific lexical features together with general lexical features, while the proposed method can be used to reduce the number of suspected authors thus alleviating the work of forensic linguists
2017
2017 Federated Conference on Computer Science and Information Systems (FedCSIS 2017)
Classification (of information); Learning systems; Authorship identification
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Open class authorship attribution of lithuanian internet comments using one-class classifier / Venčkauskas, Algimantas; Karpavičius, Arnas; Damaševičius, Robertas; Marcinkevičius, Romas; Kapočiūtė-Dzikienė, Jurgita; Napoli, Christian. - (2017), pp. 373-382. (Intervento presentato al convegno 2017 Federated Conference on Computer Science and Information Systems (FedCSIS 2017) tenutosi a Prague; Czech Republic) [10.15439/2017F461].
File allegati a questo prodotto
File Dimensione Formato  
Venckauskas_Open-class-authorship_2017.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 310.02 kB
Formato Adobe PDF
310.02 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1328707
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 9
social impact