Data compression and learning in time sequences analysis

Puglisi, A.; Benedetto, Dario; Caglioti, Emanuele; Loreto, Vittorio; Vulpiani, Angelo

doi:10.1016/s0167-2789(03)00047-2

Motivated by the problem of the definition of a distance between two sequences of characters, we investigate the so-called learning process of a typical sequential data compression schemes. We focus on the problem of how a compression algorithm optimizes its features at the interface between two different sequences A and B while zipping the sequence A + B obtained by simply appending B after A. We show the existence of a scaling function (the "learning function") which rules the way in which the compression algorithm learns a sequence B after having compressed a sequence A. In particular it turns out that there exists a cross-over length for the sequence B, which depends on the relative entropy between A and B, below which the compression algorithm does not learn the sequence B (measuring in this way the cross-entropy between A and B) and above which it starts learning B, i.e. optimizing the compression using the specific features of B. We check the scaling on three main classes of systems: Bernoulli schemes, Markovian sequences and the symbolic dynamic generated by a nontrivial chaotic system (the Lozi map). As a last application of the method we present the results of a recognition experiment, namely recognize which dynamical systems produced a given time sequence. We finally point out the potentiality of these results for segmentation purposes, i.e. the identification of homogeneous sub-sequences in heterogeneous sequences (with applications in various fields from genetic to time-series analysis). (C) 2003 Elsevier Science B.V. All rights reserved.

Data compression and learning in time sequences analysis / A., Puglisi; Benedetto, Dario; Caglioti, Emanuele; Loreto, Vittorio; Vulpiani, Angelo. - In: PHYSICA D-NONLINEAR PHENOMENA. - ISSN 0167-2789. - 180:1-2(2003), pp. 92-107. [10.1016/s0167-2789(03)00047-2]

Data compression and learning in time sequences analysis

A. Puglisi;BENEDETTO, Dario;CAGLIOTI, Emanuele;LORETO, Vittorio;VULPIANI, Angelo

2003

Abstract

Motivated by the problem of the definition of a distance between two sequences of characters, we investigate the so-called learning process of a typical sequential data compression schemes. We focus on the problem of how a compression algorithm optimizes its features at the interface between two different sequences A and B while zipping the sequence A + B obtained by simply appending B after A. We show the existence of a scaling function (the "learning function") which rules the way in which the compression algorithm learns a sequence B after having compressed a sequence A. In particular it turns out that there exists a cross-over length for the sequence B, which depends on the relative entropy between A and B, below which the compression algorithm does not learn the sequence B (measuring in this way the cross-entropy between A and B) and above which it starts learning B, i.e. optimizing the compression using the specific features of B. We check the scaling on three main classes of systems: Bernoulli schemes, Markovian sequences and the symbolic dynamic generated by a nontrivial chaotic system (the Lozi map). As a last application of the method we present the results of a recognition experiment, namely recognize which dynamical systems produced a given time sequence. We finally point out the potentiality of these results for segmentation purposes, i.e. the identification of homogeneous sub-sequences in heterogeneous sequences (with applications in various fields from genetic to time-series analysis). (C) 2003 Elsevier Science B.V. All rights reserved.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2003
			
	Parole chiave
	
				characterization of complexity; compression algorithm; time sequence analysis
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Data compression and learning in time sequences analysis / A., Puglisi; Benedetto, Dario; Caglioti, Emanuele; Loreto, Vittorio; Vulpiani, Angelo. - In: PHYSICA D-NONLINEAR PHENOMENA. - ISSN 0167-2789. - 180:1-2(2003), pp. 92-107. [10.1016/s0167-2789(03)00047-2]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/253341

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

33

32

Catalogo dei prodotti della ricerca