Nowadays, the wide development of techniques to communicate and store information of all kinds has raised the need to find new methods to analyze and interpret big quantities of data. One of the most important problems in sequential data analysis is frequent pattern mining, that consists in finding frequent subsequences (patterns) in a sequence database in order to highlight and to extract interesting knowledge from the data at hand. Usually real-world data is affected by several noise sources and this makes the analysis more challenging, so that approximate pattern matching methods are required. A common procedure employed to identify recurrent patterns in noisy data is based on clustering algorithms relying on some edit distance between subsequences. When facing inexact mining problems, this plain approach can produce many spurious patterns due to multiple pattern matchings on the same sequence excerpt. In this paper we present a method to overcome this drawback by applying an optimization-based filter that identifies the most descriptive patterns among those found by the clustering process, able to return clusters more compact and easily interpretable. We evaluate the mining system's performances using synthetic data with variable amounts of noise, showing that the algorithm performs well in synthesizing retrieved patterns with acceptable information loss.

Information granules filtering for inexact sequential pattern mining by evolutionary computation / Maiorino, Enrico; Possemato, Francesca; Modugno, Valerio; Rizzi, Antonello. - STAMPA. - (2014), pp. 104-111. (Intervento presentato al convegno International Conference on Evolutionary Computation Theory and Applications - ECTA 2014 tenutosi a Rome; Italy).

Information granules filtering for inexact sequential pattern mining by evolutionary computation

MAIORINO, ENRICO;POSSEMATO, FRANCESCA;MODUGNO, VALERIO;RIZZI, Antonello
2014

Abstract

Nowadays, the wide development of techniques to communicate and store information of all kinds has raised the need to find new methods to analyze and interpret big quantities of data. One of the most important problems in sequential data analysis is frequent pattern mining, that consists in finding frequent subsequences (patterns) in a sequence database in order to highlight and to extract interesting knowledge from the data at hand. Usually real-world data is affected by several noise sources and this makes the analysis more challenging, so that approximate pattern matching methods are required. A common procedure employed to identify recurrent patterns in noisy data is based on clustering algorithms relying on some edit distance between subsequences. When facing inexact mining problems, this plain approach can produce many spurious patterns due to multiple pattern matchings on the same sequence excerpt. In this paper we present a method to overcome this drawback by applying an optimization-based filter that identifies the most descriptive patterns among those found by the clustering process, able to return clusters more compact and easily interpretable. We evaluate the mining system's performances using synthetic data with variable amounts of noise, showing that the algorithm performs well in synthesizing retrieved patterns with acceptable information loss.
2014
International Conference on Evolutionary Computation Theory and Applications - ECTA 2014
Evolutionary computation; Frequent subsequences extraction; Granular modeling; Inexact sequence matching; Sequence data mining
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Information granules filtering for inexact sequential pattern mining by evolutionary computation / Maiorino, Enrico; Possemato, Francesca; Modugno, Valerio; Rizzi, Antonello. - STAMPA. - (2014), pp. 104-111. (Intervento presentato al convegno International Conference on Evolutionary Computation Theory and Applications - ECTA 2014 tenutosi a Rome; Italy).
File allegati a questo prodotto
File Dimensione Formato  
Maiorino_Information2014.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 255.88 kB
Formato Adobe PDF
255.88 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/632583
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact