Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets.

Motif counting beyond five nodes / Bressan, Marco; Chierichetti, Flavio; Kumar, Ravi; Leucci, Stefano; Panconesi, Alessandro. - In: ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA. - ISSN 1556-4681. - 12:4(2018), pp. 1-25. [10.1145/3186586]

Motif counting beyond five nodes

Bressan, Marco
;
Chierichetti, Flavio;Leucci, Stefano;Panconesi, Alessandro
2018

Abstract

Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets.
2018
Graph enumeration, Graph algorithms, Data mining, Random walks and Markov chains
01 Pubblicazione su rivista::01a Articolo in rivista
Motif counting beyond five nodes / Bressan, Marco; Chierichetti, Flavio; Kumar, Ravi; Leucci, Stefano; Panconesi, Alessandro. - In: ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA. - ISSN 1556-4681. - 12:4(2018), pp. 1-25. [10.1145/3186586]
File allegati a questo prodotto
File Dimensione Formato  
Bressan_Motif_2018.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.71 MB
Formato Adobe PDF
1.71 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1131921
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 36
  • ???jsp.display-item.citation.isi??? 27
social impact