Multidimensional phenomena are often represented by complex data structures. With the rapid growth of data availability and complexity, new methodologies are needed to handle these kind of data. Among complex data structures, deep interest has been devoted to three-dimensional data and network data, since many applications can be represented as such. Among methodological techniques, cluster analysis is one of the most popular and successful techniques for data exploration and characterization. However, existing methodologies for describing and analyzing such complex data use a hard approach to clustering, even though many applications show the need to use a fuzzy approach, as it allows for better interpretation of results and greater closeness of results to reality. What is proposed in this thesis are new methodologies for applying fuzzy clustering to complex data structures, such as three-way data and network data. The fuzzy approach to clustering proves extremely useful in the simulations and real-world applications which will be discussed through the chapters. The first chapter introduces the notions of complex data structures and positions the problem, highlighting the rationale behind the proposed methodologies through theoretical discussions and real-world practical examples. The second chapter provides the reader with terminology used throughout the thesis and definitions of basic concepts. From the third to the sixth chapter, four different research works are presented. The first work introduces the notions of three-way three-mode data, as a data array made up by different units-by-variables matrix, each of which refers to a specific occasion (usually time); by applying hierarchical clustering techniques to each units-by-variables data matrix, a set of hierarchies (dendrograms) is obtained. The new methodology proposes to obtain a fuzzy partition of the set of hierarchies and simultaneously, within each class of the partition, identify a consensus hierarchy. The second work can be considered as an extension of the previous one. Given a set of hierarchies, the proposed new methodology makes it possible to obtain a fuzzy partition of them, and within each class of the partition, identify a parsimonious consensus dendrogram. The notion of parsimonious is extensively commented and discussed in the corresponding chapter. However, here it is important to recall that a parsimonious dendrogram is useful for getting a clear and direct idea of how units aggregate into clusters, highlighting only the most important aggregations and deleting misleading ones. The third work introduces a new methodological proposal to obtain a fuzzy partition of a three-way three-mode data array with corresponding consensus matrices for each class in the partition and simultaneously reduce the dimension of the variables in the consensus matrices by applying a disjoint second-order factor analysis. The motivation and theoretical background are discussed in the corresponding chapter. Finally, the last work focuses on how to apply different fuzzy clustering techniques to a set of networks. In particular, the main issue that arises in this kind of problem concerns how to represent networks so that they can be given as input to the clustering algorithms. Several representations of networks involving probability distributions and graph embedding techniques are presented and discussed. The last chapter summarizes the main contents of the thesis, recalling the methodological proposals, emphasizing their relevance and contribution, especially their strength when applied to real scenarios. Finally, the necessity of using a fuzzy approach to clustering and its main advantage are emphasized.
Fuzzy clustering for complex data structures / Bombelli, Ilaria. - (2024 Jan 23).
Fuzzy clustering for complex data structures
BOMBELLI, ILARIA
23/01/2024
Abstract
Multidimensional phenomena are often represented by complex data structures. With the rapid growth of data availability and complexity, new methodologies are needed to handle these kind of data. Among complex data structures, deep interest has been devoted to three-dimensional data and network data, since many applications can be represented as such. Among methodological techniques, cluster analysis is one of the most popular and successful techniques for data exploration and characterization. However, existing methodologies for describing and analyzing such complex data use a hard approach to clustering, even though many applications show the need to use a fuzzy approach, as it allows for better interpretation of results and greater closeness of results to reality. What is proposed in this thesis are new methodologies for applying fuzzy clustering to complex data structures, such as three-way data and network data. The fuzzy approach to clustering proves extremely useful in the simulations and real-world applications which will be discussed through the chapters. The first chapter introduces the notions of complex data structures and positions the problem, highlighting the rationale behind the proposed methodologies through theoretical discussions and real-world practical examples. The second chapter provides the reader with terminology used throughout the thesis and definitions of basic concepts. From the third to the sixth chapter, four different research works are presented. The first work introduces the notions of three-way three-mode data, as a data array made up by different units-by-variables matrix, each of which refers to a specific occasion (usually time); by applying hierarchical clustering techniques to each units-by-variables data matrix, a set of hierarchies (dendrograms) is obtained. The new methodology proposes to obtain a fuzzy partition of the set of hierarchies and simultaneously, within each class of the partition, identify a consensus hierarchy. The second work can be considered as an extension of the previous one. Given a set of hierarchies, the proposed new methodology makes it possible to obtain a fuzzy partition of them, and within each class of the partition, identify a parsimonious consensus dendrogram. The notion of parsimonious is extensively commented and discussed in the corresponding chapter. However, here it is important to recall that a parsimonious dendrogram is useful for getting a clear and direct idea of how units aggregate into clusters, highlighting only the most important aggregations and deleting misleading ones. The third work introduces a new methodological proposal to obtain a fuzzy partition of a three-way three-mode data array with corresponding consensus matrices for each class in the partition and simultaneously reduce the dimension of the variables in the consensus matrices by applying a disjoint second-order factor analysis. The motivation and theoretical background are discussed in the corresponding chapter. Finally, the last work focuses on how to apply different fuzzy clustering techniques to a set of networks. In particular, the main issue that arises in this kind of problem concerns how to represent networks so that they can be given as input to the clustering algorithms. Several representations of networks involving probability distributions and graph embedding techniques are presented and discussed. The last chapter summarizes the main contents of the thesis, recalling the methodological proposals, emphasizing their relevance and contribution, especially their strength when applied to real scenarios. Finally, the necessity of using a fuzzy approach to clustering and its main advantage are emphasized.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_Bombelli.pdf
accesso aperto
Note: Tesi di dottorato
Tipologia:
Tesi di dottorato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
4.12 MB
Formato
Adobe PDF
|
4.12 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.