We investigate the use of topic models such as latent Dirichlet allocation (LDA) on two real-world problems: classifying documents and retrieving similar documents to a given document, with reference to a corpus of Italian Supreme Court decisions. A topic model is a generative model that specifies a simple probabilistic procedure by which documents in a corpus can be generated. In the LDA approach, a topic is a probability distribution over a fixed vocabulary of terms, each document is modeled as a mixture of K topics, and the mixing coefficients can be used for representing documents as points on K-1 dimensional simplex spanned by the topics. Approximate posterior inference is performed in order to learn the hidden topical structure from the observed data, i.e., the words in the documents.

Applying LDA topic model to a corpus of Italian Supreme Court decisions / Paolo, Fantini; Brutti, Pierpaolo. - ELETTRONICO. - (2014). (Intervento presentato al convegno Conference of European Statistics Stakeholders tenutosi a Roma nel 24-25 Novembre 2014).

Applying LDA topic model to a corpus of Italian Supreme Court decisions

BRUTTI, Pierpaolo
2014

Abstract

We investigate the use of topic models such as latent Dirichlet allocation (LDA) on two real-world problems: classifying documents and retrieving similar documents to a given document, with reference to a corpus of Italian Supreme Court decisions. A topic model is a generative model that specifies a simple probabilistic procedure by which documents in a corpus can be generated. In the LDA approach, a topic is a probability distribution over a fixed vocabulary of terms, each document is modeled as a mixture of K topics, and the mixing coefficients can be used for representing documents as points on K-1 dimensional simplex spanned by the topics. Approximate posterior inference is performed in order to learn the hidden topical structure from the observed data, i.e., the words in the documents.
2014
Conference of European Statistics Stakeholders
04 Pubblicazione in atti di convegno::04d Abstract in atti di convegno
Applying LDA topic model to a corpus of Italian Supreme Court decisions / Paolo, Fantini; Brutti, Pierpaolo. - ELETTRONICO. - (2014). (Intervento presentato al convegno Conference of European Statistics Stakeholders tenutosi a Roma nel 24-25 Novembre 2014).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/657343
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact