Catalogo dei prodotti della ricerca

This paper presents a multi-domain interests dataset to train and test Recommender Systems, and the methodology to create the dataset from Twitter messages in English and Italian. The English dataset includes an average of 90 preferences per user on music, books, movies, celebrities, sport, politics and much more, for about half million users. Preferences are either extracted from messages of users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their ”topical” friends, i.e., followees representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles describing them. This unique feature of our dataset provides a mean to derive a semantic categorization of the preferred items, exploiting available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others.

A large multilingual and multi-domain dataset for recommender systems / DI TOMMASO, Giorgia; Faralli, Stefano; Velardi, Paola. - ELETTRONICO. - (2018). (Intervento presentato al convegno International Conference on Language Resources and Evaluation (LREC) tenutosi a Miyazaki, Japan).

A large multilingual and multi-domain dataset for recommender systems

Giorgia Di Tommaso;Stefano Faralli;Paola Velardi

2018

Abstract

This paper presents a multi-domain interests dataset to train and test Recommender Systems, and the methodology to create the dataset from Twitter messages in English and Italian. The English dataset includes an average of 90 preferences per user on music, books, movies, celebrities, sport, politics and much more, for about half million users. Preferences are either extracted from messages of users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their ”topical” friends, i.e., followees representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles describing them. This unique feature of our dataset provides a mean to derive a semantic categorization of the preferred items, exploiting available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2018
			
	Nome convegno
	
				International Conference on Language Resources and Evaluation (LREC)
			
	Parole chiave
	
				recommender systems; interests dataset; Twitter
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				A large multilingual and multi-domain dataset for recommender systems / DI TOMMASO, Giorgia; Faralli, Stefano; Velardi, Paola. - ELETTRONICO. - (2018). (Intervento presentato al  convegno International Conference on Language Resources and Evaluation (LREC) tenutosi a Miyazaki, Japan).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Velardi_Large_Multilingual_2018.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 420.71 kB Formato Adobe PDF	420.71 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1112827

Citazioni

ND

ND

ND

social impact