Catalogo dei prodotti della ricerca

Two novel distances between categorical time series are introduced. Both of them measure discrepancies between extracted features describing the underlying serial dependence pat-terns. One distance is based on well-known association measures, namely Cramer's v and Cohen's j. The other one relies on the so-called binarization of a categorical process, which indicates the presence of each category by means of a canonical vector. Binarization is used to construct a set of innovative association measures which allow to identify different types of serial dependence. The metrics are used to perform crisp and fuzzy clustering of nominal series. The proposed approaches are able to group together series generated from similar stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. Extensive simulation studies show that both hard and soft clustering algorithms outperform several alternative procedures pro-posed in the literature. Two applications involving biological sequences from different spe-cies highlight the usefulness of the introduced techniques.(c) 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences / López-Oriona, A; Vilar, Ja; D'Urso, P. - In: INFORMATION SCIENCES. - ISSN 0020-0255. - 624:(2023), pp. 467-492. [10.1016/j.ins.2022.12.065]

Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences

López-Oriona, A;Vilar, JA;D'Urso, P

2023

Abstract

Two novel distances between categorical time series are introduced. Both of them measure discrepancies between extracted features describing the underlying serial dependence pat-terns. One distance is based on well-known association measures, namely Cramer's v and Cohen's j. The other one relies on the so-called binarization of a categorical process, which indicates the presence of each category by means of a canonical vector. Binarization is used to construct a set of innovative association measures which allow to identify different types of serial dependence. The metrics are used to perform crisp and fuzzy clustering of nominal series. The proposed approaches are able to group together series generated from similar stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. Extensive simulation studies show that both hard and soft clustering algorithms outperform several alternative procedures pro-posed in the literature. Two applications involving biological sequences from different spe-cies highlight the usefulness of the introduced techniques.(c) 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Parole chiave
	
				Categorical time series; Association measures; Hard clustering; Fuzzy clustering; Biological sequences
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences / López-Oriona, A; Vilar, Ja; D'Urso, P. - In: INFORMATION SCIENCES. - ISSN 0020-0255. - 624:(2023), pp. 467-492. [10.1016/j.ins.2022.12.065]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
1-s2.0-S0020025522015602-main.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.7 MB Formato Adobe PDF Contatta l'autore	1.7 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1689744

Citazioni

ND

12

11

social impact