Catalogo dei prodotti della ricerca

We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log |V |·(1+s ·|V |/|E|)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computation of clustering and transitivity coefficient, and discovery of frequent patterns in large graphs. We have implemented both algorithms and evaluated their performance on networks from different application domains. The sizes of the considered graphs varied from about 8, 000 nodes and 40, 000 edges to 135 million nodes and more than 1 billion edges. For both algorithms we run experiments with parameter s = 1, 000, 10, 000, 100, 000, 1, 000, 000 to evaluate running time and approximation guarantee. Both algorithms appear to be time efficient for these sample sizes. The approximation quality of the first algorithm was varying significantly and even for s = 1, 000, 000 we had more than 10% deviation for more than half of the instances. The second algorithm performed much better and even for s = 10, 000 we had an average deviation of less than 6% (taken over all but the largest instance for which we could not compute the number of triangles exactly). Copyright 2006 ACM.

Counting triangles in data streams / Luciana S., B., Gereon, F., Leonardi, S., MARCHETTI SPACCAMELA, A., Christian, S.. - (2006), pp. 253-262. (25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006 Chicago; United States 26 June 2006 through 28 June 2006) [10.1145/1142351.1142388].

Counting triangles in data streams

Luciana S. Buriol;Gereon Frahling;LEONARDI, Stefano;MARCHETTI SPACCAMELA, Alberto;Christian Sohler

2006

Abstract

We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log |V |·(1+s ·|V |/|E|)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computation of clustering and transitivity coefficient, and discovery of frequent patterns in large graphs. We have implemented both algorithms and evaluated their performance on networks from different application domains. The sizes of the considered graphs varied from about 8, 000 nodes and 40, 000 edges to 135 million nodes and more than 1 billion edges. For both algorithms we run experiments with parameter s = 1, 000, 10, 000, 100, 000, 1, 000, 000 to evaluate running time and approximation guarantee. Both algorithms appear to be time efficient for these sample sizes. The approximation quality of the first algorithm was varying significantly and even for s = 1, 000, 000 we had more than 10% deviation for more than half of the instances. The second algorithm performed much better and even for s = 10, 000 we had an average deviation of less than 6% (taken over all but the largest instance for which we could not compute the number of triangles exactly). Copyright 2006 ACM.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2006
			
	Nome convegno
	
				25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006
			
	Parole chiave
	
				graph algorithms; network analysis; streaming algorithms
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Counting triangles in data streams / Luciana S., B., Gereon, F., Leonardi, S., MARCHETTI SPACCAMELA, A., Christian, S.. - (2006), pp. 253-262. (25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006 Chicago; United States 26 June 2006 through 28 June 2006) [10.1145/1142351.1142388].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
VE_2006_11573-241312.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 239.51 kB Formato Adobe PDF Contatta l'autore	239.51 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/241312

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

201

ND

social impact