In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.

Misspelling Oblivious Word Embeddings / Piktus, Aleksandra; Bora Edizel, Necati; Bojanowski, Piotr; Grave, Edouard; Ferreira, Rui; Silvestri, Fabrizio. - (2019), pp. 3226-3234. (Intervento presentato al convegno NAACL 2020 tenutosi a Minneapolis) [10.18653/v1/N19-1326].

Misspelling Oblivious Word Embeddings

Fabrizio Silvestri
2019

Abstract

In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.
2019
NAACL 2020
misspelling, embeddings, misspelling oblivious embedding, word embedding
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Misspelling Oblivious Word Embeddings / Piktus, Aleksandra; Bora Edizel, Necati; Bojanowski, Piotr; Grave, Edouard; Ferreira, Rui; Silvestri, Fabrizio. - (2019), pp. 3226-3234. (Intervento presentato al convegno NAACL 2020 tenutosi a Minneapolis) [10.18653/v1/N19-1326].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1481942
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 31
  • ???jsp.display-item.citation.isi??? ND
social impact