A general-purpose data mining model for Arabic texts (Arabic Meaning Extraction through Lexical Resources, ArMExLeR) is proposed which employs a chained pipeline of existing public domain and published lexical resources (Stanford Parser, WordNet, Arabic WordNet, SUMO, AraMorph, A Frequency Dictionary of Arabic) in order to extract a weakly hierarchised, single-predicate level, representation of meaning. This kind of model would be of high impact on the study of the computational analysis of Arabic for there is no such comparable tool for this language, and will be a challenge for the nature of its specificities. One should, in fact, cope with the unique writing system that is mostly consonant-based and doesn’t always mark vowels explicitly. This is crucial when you want to analyze an Arabic corpus for the same consonantal ductus may be read in several ways.
Arabic meaning extraction through lexical resources: A general-purpose data mining model for arabic texts / Lancioni, Giuliano; Pepe, Ivana; Alessandra, Silighini; Valeria, Pettinari; Cicola, Ilaria; Leila, Benassi; Campanelli, Marta. - ELETTRONICO. - (2013), pp. 107-112. (Intervento presentato al convegno IMMM 2013, The Third International Conference on Advances in Information Mining and Management tenutosi a Lisbona, Portogallo nel 17/11/2013).
Arabic meaning extraction through lexical resources: A general-purpose data mining model for arabic texts
LANCIONI, Giuliano;PEPE, IVANA;CICOLA, ILARIA;CAMPANELLI, MARTA
2013
Abstract
A general-purpose data mining model for Arabic texts (Arabic Meaning Extraction through Lexical Resources, ArMExLeR) is proposed which employs a chained pipeline of existing public domain and published lexical resources (Stanford Parser, WordNet, Arabic WordNet, SUMO, AraMorph, A Frequency Dictionary of Arabic) in order to extract a weakly hierarchised, single-predicate level, representation of meaning. This kind of model would be of high impact on the study of the computational analysis of Arabic for there is no such comparable tool for this language, and will be a challenge for the nature of its specificities. One should, in fact, cope with the unique writing system that is mostly consonant-based and doesn’t always mark vowels explicitly. This is crucial when you want to analyze an Arabic corpus for the same consonantal ductus may be read in several ways.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.