Pattern recognition and machine learning problems are often conceived to work on metric vector spaces, where patterns are described by multi-dimensional feature vectors. However, many real-world complex systems are more conveniently modelled by complex data structures such as graphs, which are able to capture topological and semantic information amongst entities. This Thesis helps in bridging the gap between pattern recognition and graphs, with major emphasis on the hypergraphs domain. Six different strategies for solving graph-based pattern recognition problems are proposed, spanning several paradigms including kernel methods, embedding spaces and feature generation. The first two techniques map a graph towards a vector space by means of the spectral density of the Laplacian matrix and by means of topological invariants called the Betti numbers, respectively. Two additional techniques, according to the Granular Computing paradigm, map a graph towards a vector space by means of symbolic histograms. In a first case, simplices extracted from the simplicial complexes evaluated over the underlying graph are considered as candidate pivotal substructures for synthesising the symbolic histograms; in a second case, each path along a graph can be assigned a score that consider its specificity and sensitivity with respect to one of the problem-related classes and its inclusion in the candidate pivotal substructures is strictly related to its score. The final two techniques fall under the kernel methods umbrella: the first defines novel hypergraph kernels on the top of the simplicial complexes, the latter embraces a multiple kernel paradigm to exploit multiple graph representations simultaneously. These techniques are tested on real-world problems related to three biological case studies, namely the solubility prediction and enzymatic properties discrimination in protein networks and the analysis of metabolic networks. Further, the most cutting-edge techniques are also tested on well-known benchmark datasets for graph classification and compared against current approaches in graph-based pattern recognition.

Pattern recognition techniques for modelling complex systems in non-metric domains / Martino, Alessio. - (2020 Feb 18).

Pattern recognition techniques for modelling complex systems in non-metric domains

MARTINO, ALESSIO
18/02/2020

Abstract

Pattern recognition and machine learning problems are often conceived to work on metric vector spaces, where patterns are described by multi-dimensional feature vectors. However, many real-world complex systems are more conveniently modelled by complex data structures such as graphs, which are able to capture topological and semantic information amongst entities. This Thesis helps in bridging the gap between pattern recognition and graphs, with major emphasis on the hypergraphs domain. Six different strategies for solving graph-based pattern recognition problems are proposed, spanning several paradigms including kernel methods, embedding spaces and feature generation. The first two techniques map a graph towards a vector space by means of the spectral density of the Laplacian matrix and by means of topological invariants called the Betti numbers, respectively. Two additional techniques, according to the Granular Computing paradigm, map a graph towards a vector space by means of symbolic histograms. In a first case, simplices extracted from the simplicial complexes evaluated over the underlying graph are considered as candidate pivotal substructures for synthesising the symbolic histograms; in a second case, each path along a graph can be assigned a score that consider its specificity and sensitivity with respect to one of the problem-related classes and its inclusion in the candidate pivotal substructures is strictly related to its score. The final two techniques fall under the kernel methods umbrella: the first defines novel hypergraph kernels on the top of the simplicial complexes, the latter embraces a multiple kernel paradigm to exploit multiple graph representations simultaneously. These techniques are tested on real-world problems related to three biological case studies, namely the solubility prediction and enzymatic properties discrimination in protein networks and the analysis of metabolic networks. Further, the most cutting-edge techniques are also tested on well-known benchmark datasets for graph classification and compared against current approaches in graph-based pattern recognition.
18-feb-2020
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Martino.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 19.72 MB
Formato Adobe PDF
19.72 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1364044
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact