Catalogo dei prodotti della ricerca

We present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction. The main technical idea is to show that the distribution over the documents generated by LDA can be transformed into a distribution for a much simpler generative model in which documents are generated from the same set of topics but have a much simpler structure: documents are single topic and topics are chosen uniformly at random. Furthermore, this reduction is approximation preserving, in the sense that approximate distributions - the only ones we can hope to compute in practice - are mapped into approximate distribution in the simplified world. This opens up the possibility of efficiently reconstructing LDA topics in a roundabout way. Compute an approximate document distribution from the given corpus, transform it into an approximate distribution for the single-topic world, and run a reconstruction algorithm in the uniform, single-topic world - a much simpler task than direct LDA reconstruction. We show the viability of the approach by giving very simple algorithms for a generalization of two notable cases that have been studied in the literature, p-separability and matrix-like topics.

A reduction for efficient LDA topic reconstruction / Almanza, M; Chierichetti, F; Panconesi, A; Vattani, A. - 31:(2018). (Intervento presentato al convegno Thirty-second Conference on Neural Information Processing Systems tenutosi a Montréal; Canada).

A reduction for efficient LDA topic reconstruction

Almanza, M;Chierichetti, F;Panconesi, A;Vattani, A

2018

Abstract

We present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction. The main technical idea is to show that the distribution over the documents generated by LDA can be transformed into a distribution for a much simpler generative model in which documents are generated from the same set of topics but have a much simpler structure: documents are single topic and topics are chosen uniformly at random. Furthermore, this reduction is approximation preserving, in the sense that approximate distributions - the only ones we can hope to compute in practice - are mapped into approximate distribution in the simplified world. This opens up the possibility of efficiently reconstructing LDA topics in a roundabout way. Compute an approximate document distribution from the given corpus, transform it into an approximate distribution for the single-topic world, and run a reconstruction algorithm in the uniform, single-topic world - a much simpler task than direct LDA reconstruction. We show the viability of the approach by giving very simple algorithms for a generalization of two notable cases that have been studied in the literature, p-separability and matrix-like topics.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2018
			
	Nome convegno
	
				Thirty-second Conference on Neural Information Processing Systems
			
	Parole chiave
	
				machine learning; statistical inference; generative models
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				A reduction for efficient LDA topic reconstruction / Almanza, M; Chierichetti, F; Panconesi, A; Vattani, A. - 31:(2018). (Intervento presentato al  convegno Thirty-second Conference on Neural Information Processing Systems tenutosi a Montréal; Canada).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Almanza_reduction_2018.pdf accesso aperto Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 434.08 kB Formato Adobe PDF	434.08 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1351514

Citazioni

ND

0

0

social impact