In contemporary linguistics the definition of those entities which are referred to as multiword expressions (MWEs) remains controversial. It is intuitively clear that some words, when appearing together, have some “special bond” in terms of meaning (e.g. black hole, mountain chain), or lexical choice (e.g strong tea, to fill a form), contrary to free combinations. Nevertheless, the great variety of features and anomalous behaviours that these expressions exhibit makes it difficult to organize them into categories and gave rise to a great amount of different and sometimes overlapping terminology. So far, most approaches in corpus linguistics have focused on trying to automatically extract MWEs from corpora by using statistical association measures, while theoretical aspects related to their definition, typology and behaviours arising from quantitative corpus-based studies have not been widely explored, especially for languages with a rich morphology and relatively free word order, such as Italian. I show that a systematic analysis of the empirical behaviour of Italian MWEs in large corpora, with respect to several parameters, such as syntactic and semantic variations, is useful to outline a subcategorization of the expressions in homogeneous sets which approximately correspond to what is intuitively known as multiword units (“polirematiche” in the Italian lexicographic tradition) and lexical collocations. These results can be obtained by using an ad-hoc designed tool (whose methodology is fully explained in my work) which is able to investigate automatically the empirical features of MWEs once that a large corpus and a list of expressions are provided.
Polirematiche e collocazioni dell'italiano. Uno studio linguistico e computazionale / Squillante, Luigi. - (2016), pp. Online-Ressource. [10.18442/535]
Polirematiche e collocazioni dell'italiano. Uno studio linguistico e computazionale
Luigi Squillante
2016
Abstract
In contemporary linguistics the definition of those entities which are referred to as multiword expressions (MWEs) remains controversial. It is intuitively clear that some words, when appearing together, have some “special bond” in terms of meaning (e.g. black hole, mountain chain), or lexical choice (e.g strong tea, to fill a form), contrary to free combinations. Nevertheless, the great variety of features and anomalous behaviours that these expressions exhibit makes it difficult to organize them into categories and gave rise to a great amount of different and sometimes overlapping terminology. So far, most approaches in corpus linguistics have focused on trying to automatically extract MWEs from corpora by using statistical association measures, while theoretical aspects related to their definition, typology and behaviours arising from quantitative corpus-based studies have not been widely explored, especially for languages with a rich morphology and relatively free word order, such as Italian. I show that a systematic analysis of the empirical behaviour of Italian MWEs in large corpora, with respect to several parameters, such as syntactic and semantic variations, is useful to outline a subcategorization of the expressions in homogeneous sets which approximately correspond to what is intuitively known as multiword units (“polirematiche” in the Italian lexicographic tradition) and lexical collocations. These results can be obtained by using an ad-hoc designed tool (whose methodology is fully explained in my work) which is able to investigate automatically the empirical features of MWEs once that a large corpus and a list of expressions are provided.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.