Catalogo dei prodotti della ricerca

Internet can be misused by cyber criminals as a plat- form to conduct illegitimate activities (such as harassment, cyber bullying, and incitement of hate or violence) anonymously. As a result, authorship analysis of anonymous texts in Internet (such as emails, forum comments) has attracted significant attention in the digital forensic and text mining communities. The main problem is a large number of possible of authors, which hinders the effective identification of a true author. We interpret open class author attribution as a process of expert recommendation where the decision support system returns a list of suspected authors for further analysis by forensics experts rather than a single prediction result, thus reducing the scale of the problem. We describe the task formally and present algorithms for constructing the suspected author list. For evaluation we propose using a simple Winner-Takes-All (WTA) metric as well as a set of gain-discount model based metrics from the information retrieval domain (mean reciprocal rank, discounted cumulative gain and rank-biased precision). We also propose the List Precision (LP) metric as an extension of WTA for evaluating the usability of the suspected author list. For experiments, we use our own dataset of Internet comments in Lithuanian language and consider the use of language-specific (Lithuanian) lexical features together with general lexical features derived from English language. For classification we use one-class Support Vector Machine (SVM) classifier. The results of experiments show that the usability of open class author attribution can be improved considerably by using a set of language-specific lexical features together with general lexical features, while the proposed method can be used to reduce the number of suspected authors thus alleviating the work of forensic linguists

Open class authorship attribution of lithuanian internet comments using one-class classifier / Venčkauskas, A., Karpavičius, A., Damaševičius, R., Marcinkevičius, R., Kapočiūtė-Dzikienė, J., Napoli, C.. - (2017), pp. 373-382. (2017 Federated Conference on Computer Science and Information Systems (FedCSIS 2017) Prague; Czech Republic ) [10.15439/2017F461].

Open class authorship attribution of lithuanian internet comments using one-class classifier

Algimantas Venčkauskas;Arnas Karpavičius;Robertas Damaševičius;Romas Marcinkevičius;Jurgita Kapočiūtė-Dzikienė;Christian Napoli

2017

Abstract

Internet can be misused by cyber criminals as a plat- form to conduct illegitimate activities (such as harassment, cyber bullying, and incitement of hate or violence) anonymously. As a result, authorship analysis of anonymous texts in Internet (such as emails, forum comments) has attracted significant attention in the digital forensic and text mining communities. The main problem is a large number of possible of authors, which hinders the effective identification of a true author. We interpret open class author attribution as a process of expert recommendation where the decision support system returns a list of suspected authors for further analysis by forensics experts rather than a single prediction result, thus reducing the scale of the problem. We describe the task formally and present algorithms for constructing the suspected author list. For evaluation we propose using a simple Winner-Takes-All (WTA) metric as well as a set of gain-discount model based metrics from the information retrieval domain (mean reciprocal rank, discounted cumulative gain and rank-biased precision). We also propose the List Precision (LP) metric as an extension of WTA for evaluating the usability of the suspected author list. For experiments, we use our own dataset of Internet comments in Lithuanian language and consider the use of language-specific (Lithuanian) lexical features together with general lexical features derived from English language. For classification we use one-class Support Vector Machine (SVM) classifier. The results of experiments show that the usability of open class author attribution can be improved considerably by using a set of language-specific lexical features together with general lexical features, while the proposed method can be used to reduce the number of suspected authors thus alleviating the work of forensic linguists

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2017
			
	Nome convegno
	
				2017 Federated Conference on Computer Science and Information Systems (FedCSIS 2017)
			
	Parole chiave
	
				Classification (of information); Learning systems; Authorship identification
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Open class authorship attribution of lithuanian internet comments using one-class classifier / Venčkauskas, A., Karpavičius, A., Damaševičius, R., Marcinkevičius, R., Kapočiūtė-Dzikienė, J., Napoli, C.. - (2017), pp. 373-382. (2017 Federated Conference on Computer Science and Information Systems (FedCSIS 2017) Prague; Czech Republic ) [10.15439/2017F461].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Venckauskas_Open-class-authorship_2017.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 310.02 kB Formato Adobe PDF Contatta l'autore	310.02 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1328707

Citazioni

ND

21

10

social impact