Lexical resources on Arabic tend to focus on the standard version of the language (Modern Standard Arabic, MSA), mostly used in written and formal sources. However, the diffusion of informal genres has increasingly made it necessary the production of wider resources, encompassing the features of spoken varieties commonly found in written texts. The Lahajat project addresses this need by providing a series of rule-based transformations that enlarge existing lexical resources for MSA in order to cover for typical morphonological features found in spoken varieties. In particular, two specific case studies are shown that apply to two widely diverging varieties, Egyptian Arabic and Tunisian Arabish.

Lahajat: a rule-based converter of standard Arabic lexical databases into spoken Arabic forms / Lancioni, Giuliano; Gugliotta, Elisa; Pettinari, Valeria. - (2018), pp. 395-399. (Intervento presentato al convegno 4th IEEE International Colloquium on Information Science and Technology (CiSt) tenutosi a Tangier, Morocco) [10.1109/CIST.2016.7805078].

Lahajat: a rule-based converter of standard Arabic lexical databases into spoken Arabic forms

Giuliano Lancioni;Elisa Gugliotta;Valeria Pettinari
2018

Abstract

Lexical resources on Arabic tend to focus on the standard version of the language (Modern Standard Arabic, MSA), mostly used in written and formal sources. However, the diffusion of informal genres has increasingly made it necessary the production of wider resources, encompassing the features of spoken varieties commonly found in written texts. The Lahajat project addresses this need by providing a series of rule-based transformations that enlarge existing lexical resources for MSA in order to cover for typical morphonological features found in spoken varieties. In particular, two specific case studies are shown that apply to two widely diverging varieties, Egyptian Arabic and Tunisian Arabish.
2018
4th IEEE International Colloquium on Information Science and Technology (CiSt)
linguistics; natural language processing; Tunisian Arabish; Egyptian Arabic; morphonological features; MSA; rule-based transformations; Lahajat project; informal genres diffusion; written sources; formal sources; Lexical resources; spoken Arabic forms; standard Arabic lexical databases; rule-based converter; Standards; Context; Electronic mail; Keyboards; Production; Morphology; Context modeling
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Lahajat: a rule-based converter of standard Arabic lexical databases into spoken Arabic forms / Lancioni, Giuliano; Gugliotta, Elisa; Pettinari, Valeria. - (2018), pp. 395-399. (Intervento presentato al convegno 4th IEEE International Colloquium on Information Science and Technology (CiSt) tenutosi a Tangier, Morocco) [10.1109/CIST.2016.7805078].
File allegati a questo prodotto
File Dimensione Formato  
Lancioni_Lahajat_2016.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 780.29 kB
Formato Adobe PDF
780.29 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1275667
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact