A data compression approach to  monolingual GIRT task: an agnostic  point of view

Alderduccio, D.; Bordoni, L.; Loreto, Vittorio

doi:10.1007/978-3-540-30222-3_38

In this paper we apply a data-compression IR method in the GIRT social science database, focusing on the monolingual task in German and English. For this purpose we use a recently proposed general scheme for context recognition and context classification of strings of characters (in particular texts) or other coded information. The key point of the method is the computation of a suitable measure of remoteness (or similarity) between two strings of characters. This measure of remoteness reflects the distance between the structures present in the two strings, i.e. between the two different distributions of elements of the compared sequences. The hypothesis is that the information-theory oriented measure of remoteness between two sequences could reflect their semantic distance. It is worth stressing the generality and versatility of our information-theoretic method which applies to any kind of corpora of character strings, whatever the type of coding used (i.e. language).

A data compression approach to monolingual GIRT task: an agnostic point of view / D., A., L., B., Loreto, V.. - ELETTRONICO. - 3237:(2004), pp. 391-400. (CLEF 2003 Trondheim, Norway. 21-22 August 2003) [10.1007/978-3-540-30222-3_38].

A data compression approach to monolingual GIRT task: an agnostic point of view

D. Alderduccio;L. Bordoni;LORETO, Vittorio

2004

Abstract

In this paper we apply a data-compression IR method in the GIRT social science database, focusing on the monolingual task in German and English. For this purpose we use a recently proposed general scheme for context recognition and context classification of strings of characters (in particular texts) or other coded information. The key point of the method is the computation of a suitable measure of remoteness (or similarity) between two strings of characters. This measure of remoteness reflects the distance between the structures present in the two strings, i.e. between the two different distributions of elements of the compared sequences. The hypothesis is that the information-theory oriented measure of remoteness between two sequences could reflect their semantic distance. It is worth stressing the generality and versatility of our information-theoretic method which applies to any kind of corpora of character strings, whatever the type of coding used (i.e. language).

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2004
			
	Nome convegno
	
				CLEF 2003
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				A data compression approach to  monolingual GIRT task: an agnostic  point of view / D., A., L., B., Loreto, V.. - ELETTRONICO. - 3237:(2004), pp. 391-400. (CLEF 2003 Trondheim, Norway. 21-22 August 2003) [10.1007/978-3-540-30222-3_38].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/450262

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

Catalogo dei prodotti della ricerca