Catalogo dei prodotti della ricerca

This article describes the constitution process of the first morpho-syntactically annotated Tunisian Arabish Corpus (TArC). Arabish, also known as Arabizi, is a spontaneous coding of Arabic dialects in Latin characters and arithmographs (numbers used as letters). This code-system was developed by Arabic-speaking users of social media in order to facilitate the writing in the Computer-Mediated Communication (CMC) and text messaging informal frameworks. There is variety in the realization of Arabish amongst dialects, and each Arabish code-system is under-resourced, in the same way as most of the Arabic dialects. In the last few years, the focus on Arabic dialects in the NLP field has considerably increased. Taking this into consideration, TArC will be a useful support for different types of analyses, computational and linguistic, as well as for NLP tools training. In this article we will describe preliminary work on the TArC semi-automatic construction process and some of the first analyses we developed on TArC. In addition, in order to provide a complete overview of the challenges faced during the building process, we will present the main Tunisian dialect characteristics and their encoding in Tunisian Arabish.

TArC: Incrementally and semi-automatically collecting a Tunisian arabish corpus / Gugliotta, E., Dinarelli, M.. - (2020), pp. 6279-6286. (12th International Conference on Language Resources and Evaluation, LREC 2020 Palais du Pharo, Marseille ).

TArC: Incrementally and semi-automatically collecting a Tunisian arabish corpus

Gugliotta E.;Dinarelli M.

2020

Abstract

This article describes the constitution process of the first morpho-syntactically annotated Tunisian Arabish Corpus (TArC). Arabish, also known as Arabizi, is a spontaneous coding of Arabic dialects in Latin characters and arithmographs (numbers used as letters). This code-system was developed by Arabic-speaking users of social media in order to facilitate the writing in the Computer-Mediated Communication (CMC) and text messaging informal frameworks. There is variety in the realization of Arabish amongst dialects, and each Arabish code-system is under-resourced, in the same way as most of the Arabic dialects. In the last few years, the focus on Arabic dialects in the NLP field has considerably increased. Taking this into consideration, TArC will be a useful support for different types of analyses, computational and linguistic, as well as for NLP tools training. In this article we will describe preliminary work on the TArC semi-automatic construction process and some of the first analyses we developed on TArC. In addition, in order to provide a complete overview of the challenges faced during the building process, we will present the main Tunisian dialect characteristics and their encoding in Tunisian Arabish.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2020
			
	Nome convegno
	
				12th International Conference on Language Resources and Evaluation, LREC 2020
			
	Parole chiave
	
				Arabic dialect; Arabizi; Tunisian Arabish Corpus
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				TArC: Incrementally and semi-automatically collecting a Tunisian arabish corpus / Gugliotta, E., Dinarelli, M.. - (2020), pp. 6279-6286. (12th International Conference on Language Resources and Evaluation, LREC 2020 Palais du Pharo, Marseille ).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Gugliotta_TaRC_2020.pdf accesso aperto Note: https://www.aclweb.org/anthology/2020.lrec-1.770/ Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 232.99 kB Formato Adobe PDF	232.99 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1501335

Citazioni

ND

4

1

social impact