We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log |V |·(1+s ·|V |/|E|)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computation of clustering and transitivity coefficient, and discovery of frequent patterns in large graphs. We have implemented both algorithms and evaluated their performance on networks from different application domains. The sizes of the considered graphs varied from about 8, 000 nodes and 40, 000 edges to 135 million nodes and more than 1 billion edges. For both algorithms we run experiments with parameter s = 1, 000, 10, 000, 100, 000, 1, 000, 000 to evaluate running time and approximation guarantee. Both algorithms appear to be time efficient for these sample sizes. The approximation quality of the first algorithm was varying significantly and even for s = 1, 000, 000 we had more than 10% deviation for more than half of the instances. The second algorithm performed much better and even for s = 10, 000 we had an average deviation of less than 6% (taken over all but the largest instance for which we could not compute the number of triangles exactly). Copyright 2006 ACM.

Counting triangles in data streams / Luciana S., Buriol; Gereon, Frahling; Leonardi, Stefano; MARCHETTI SPACCAMELA, Alberto; Christian, Sohler. - (2006), pp. 253-262. (Intervento presentato al convegno 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006 tenutosi a Chicago; United States nel 26 June 2006 through 28 June 2006) [10.1145/1142351.1142388].

Counting triangles in data streams

LEONARDI, Stefano;MARCHETTI SPACCAMELA, Alberto;
2006

Abstract

We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log |V |·(1+s ·|V |/|E|)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computation of clustering and transitivity coefficient, and discovery of frequent patterns in large graphs. We have implemented both algorithms and evaluated their performance on networks from different application domains. The sizes of the considered graphs varied from about 8, 000 nodes and 40, 000 edges to 135 million nodes and more than 1 billion edges. For both algorithms we run experiments with parameter s = 1, 000, 10, 000, 100, 000, 1, 000, 000 to evaluate running time and approximation guarantee. Both algorithms appear to be time efficient for these sample sizes. The approximation quality of the first algorithm was varying significantly and even for s = 1, 000, 000 we had more than 10% deviation for more than half of the instances. The second algorithm performed much better and even for s = 10, 000 we had an average deviation of less than 6% (taken over all but the largest instance for which we could not compute the number of triangles exactly). Copyright 2006 ACM.
2006
25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006
graph algorithms; network analysis; streaming algorithms
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Counting triangles in data streams / Luciana S., Buriol; Gereon, Frahling; Leonardi, Stefano; MARCHETTI SPACCAMELA, Alberto; Christian, Sohler. - (2006), pp. 253-262. (Intervento presentato al convegno 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006 tenutosi a Chicago; United States nel 26 June 2006 through 28 June 2006) [10.1145/1142351.1142388].
File allegati a questo prodotto
File Dimensione Formato  
VE_2006_11573-241312.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 239.51 kB
Formato Adobe PDF
239.51 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/241312
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 182
  • ???jsp.display-item.citation.isi??? ND
social impact