In this thesis we study subgraph enumeration problems. We provide an extensive literature review of subgraph enumeration, considering different problems associated to cliques, clique relaxations, and other kind of subgraphs. We then devise algorithms for the problems of enumerating k-cliques (i.e., complete subgraphs on k nodes) and one of their relaxations, called k-diamonds (i.e., cliques of size k with one missing edge). For the first problem we present simple and fast multicore parallel algorithms for counting the number of k-cliques in large undirected graphs, for any small constant k >= 4. Clique counting is important in a variety of network analytics applications. Differently from existing solutions, which mainly target distributed memory settings (e.g., MapReduce), the proposed algorithms work on off-the-shelf shared-memory multicore platforms. The effectiveness of our approaches is assessed through an extensive experimental analysis on a variety of real-world graphs, considering different clique sizes and scalability on different numbers of cores. The experimental results show that the proposed parallel algorithms largely outperform the running times of the highly optimized sequential solution and gracefully scale to non-trivial values of k even on medium/large graphs. For instance, computing hundreds of billions of cliques for rather demanding Web graphs and social networks requires about 15 minutes on a 32-core machine. Moreover, the running times of the multicore algorithm are competitive – and in some cases much faster than – the state-of-the-art distributed solutions based on MapReduce. As a by-product of the experimental analysis, we also compute the exact number of k-cliques with at most 20 nodes in many real-world networks from the SNAP repository. For the second problem, we first devise a sequential algorithm for counting the number of k-diamonds in large undirected graphs, for any small constant k >= 4. The algorithm can compute the number of k-diamonds using O(sqrt(m)) extra work with respect to the clique-counting problem. A parallel extension of the sequential algorithm is then proposed, developing a MapReduce-based approach. This algorithm achieves the same local and total space usage of the state-of-the-art MapReduce algorithm for k-cliques, and uses O(sqrt(m)) extra local and global work.

Enumerating cliques and their relaxations: sequential and parallel algorithms / LEON GARCIA, Renan. - (2019 Feb 28).

Enumerating cliques and their relaxations: sequential and parallel algorithms

LEON GARCIA, RENAN
28/02/2019

Abstract

In this thesis we study subgraph enumeration problems. We provide an extensive literature review of subgraph enumeration, considering different problems associated to cliques, clique relaxations, and other kind of subgraphs. We then devise algorithms for the problems of enumerating k-cliques (i.e., complete subgraphs on k nodes) and one of their relaxations, called k-diamonds (i.e., cliques of size k with one missing edge). For the first problem we present simple and fast multicore parallel algorithms for counting the number of k-cliques in large undirected graphs, for any small constant k >= 4. Clique counting is important in a variety of network analytics applications. Differently from existing solutions, which mainly target distributed memory settings (e.g., MapReduce), the proposed algorithms work on off-the-shelf shared-memory multicore platforms. The effectiveness of our approaches is assessed through an extensive experimental analysis on a variety of real-world graphs, considering different clique sizes and scalability on different numbers of cores. The experimental results show that the proposed parallel algorithms largely outperform the running times of the highly optimized sequential solution and gracefully scale to non-trivial values of k even on medium/large graphs. For instance, computing hundreds of billions of cliques for rather demanding Web graphs and social networks requires about 15 minutes on a 32-core machine. Moreover, the running times of the multicore algorithm are competitive – and in some cases much faster than – the state-of-the-art distributed solutions based on MapReduce. As a by-product of the experimental analysis, we also compute the exact number of k-cliques with at most 20 nodes in many real-world networks from the SNAP repository. For the second problem, we first devise a sequential algorithm for counting the number of k-diamonds in large undirected graphs, for any small constant k >= 4. The algorithm can compute the number of k-diamonds using O(sqrt(m)) extra work with respect to the clique-counting problem. A parallel extension of the sequential algorithm is then proposed, developing a MapReduce-based approach. This algorithm achieves the same local and total space usage of the state-of-the-art MapReduce algorithm for k-cliques, and uses O(sqrt(m)) extra local and global work.
28-feb-2019
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Renan.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.11 MB
Formato Adobe PDF
4.11 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1283982
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact