Everyday people are exchanging a huge amount of data through the Internet. Mostly, such data consist of unstructured texts, which often contain references to structured information (e.g., person names, contact records, etc.). In this work, we propose a novel solution to discover social events from actual press news edited by humans. Concretely, our method is divided in two steps, each one addressing a specific Information Extraction (IE) task: first, we use a technique to automatically recognize four classes of named-entities from press news: Date, Location, Place, and Artist. Furthermore, we detect social events by extracting ternary relations between such entities, also exploiting evidence from external sources (i.e., the Web). Finally, we evaluate both stages of our proposed solution on a real-world dataset. Experimental results highlight the quality of our first-step Named-Entity Recognition (NER) approach, which indeed performs consistently with state-ofthe-art solutions. Eventually, we show how to precisely select true events from the list of all candidate events (i.e., all the ternary relations), which result from our second-step Relation Extraction (RE) method. Indeed, we discover that true social events can be detected if enough evidence of those is found in the result list of Web search engines.

SEED: A Framework for Extracting Social Events from Press News / Orlando, Salvatore; Pizzolon, Francesco; Tolomei, Gabriele. - (2013), pp. 1285-1294. (Intervento presentato al convegno 2nd Int.l Workshop on Web of Linked Entities (WOLE 2013) tenutosi a Rio de Janeiro, Brazil).

SEED: A Framework for Extracting Social Events from Press News

Gabriele Tolomei
2013

Abstract

Everyday people are exchanging a huge amount of data through the Internet. Mostly, such data consist of unstructured texts, which often contain references to structured information (e.g., person names, contact records, etc.). In this work, we propose a novel solution to discover social events from actual press news edited by humans. Concretely, our method is divided in two steps, each one addressing a specific Information Extraction (IE) task: first, we use a technique to automatically recognize four classes of named-entities from press news: Date, Location, Place, and Artist. Furthermore, we detect social events by extracting ternary relations between such entities, also exploiting evidence from external sources (i.e., the Web). Finally, we evaluate both stages of our proposed solution on a real-world dataset. Experimental results highlight the quality of our first-step Named-Entity Recognition (NER) approach, which indeed performs consistently with state-ofthe-art solutions. Eventually, we show how to precisely select true events from the list of all candidate events (i.e., all the ternary relations), which result from our second-step Relation Extraction (RE) method. Indeed, we discover that true social events can be detected if enough evidence of those is found in the result list of Web search engines.
2013
2nd Int.l Workshop on Web of Linked Entities (WOLE 2013)
information extraction; named-entity recognition; relation extraction; social event discovery
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
SEED: A Framework for Extracting Social Events from Press News / Orlando, Salvatore; Pizzolon, Francesco; Tolomei, Gabriele. - (2013), pp. 1285-1294. (Intervento presentato al convegno 2nd Int.l Workshop on Web of Linked Entities (WOLE 2013) tenutosi a Rio de Janeiro, Brazil).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1382700
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
social impact