Catalogo dei prodotti della ricerca

In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.

Misspelling Oblivious Word Embeddings / Piktus, Aleksandra; Bora Edizel, Necati; Bojanowski, Piotr; Grave, Edouard; Ferreira, Rui; Silvestri, Fabrizio. - (2019), pp. 3226-3234. ( NAACL 2020 Minneapolis ) [10.18653/v1/N19-1326].

Misspelling Oblivious Word Embeddings

Aleksandra Piktus;Necati Bora Edizel;Piotr Bojanowski;Edouard Grave;Rui Ferreira;Fabrizio Silvestri

2019

Abstract

In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2019
			
	Nome convegno
	
				NAACL 2020
			
	Parole chiave
	
				misspelling, embeddings, misspelling oblivious embedding, word embedding
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Misspelling Oblivious Word Embeddings / Piktus, Aleksandra; Bora Edizel, Necati; Bojanowski, Piotr; Grave, Edouard; Ferreira, Rui; Silvestri, Fabrizio. - (2019), pp. 3226-3234. ( NAACL 2020 Minneapolis ) [10.18653/v1/N19-1326].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1481942

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

40

ND

social impact