Estimating clustering indexes in data streams

Buriol, Luciana S.; Frahling, Gereon; Leonardi, Stefano; Sohler, Christian

doi:10.1007/978-3-540-75520-3_55

We present random sampling algorithms that with probability at least 1 - δ compute a (1 ± ε)-approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges. The space used by our algorithm to estimate the clustering coefficient is inversely related to the clustering coefficient of the network itself. The space used by our algorithm to compute the number K 3,3 of bipartite cliques is proportional to the ratio between the number of K1,3 and K3,3 in the graph. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size. Therefore they provide a basic tool to analyze the structure of dense clusters in large graphs and have many applications in the discovery of web communities, the analysis of the structure of large social networks and the probing of frequent patterns in large graphs. We implemented both algorithms and evaluated their performance on networks from different application domains and of different size; The largest instance is a webgraph consisting of more than 135 million nodes and 1 billion edges. Both algorithms compute accurate results in reasonable time on the tested instances. © Springer-Verlag Berlin Heidelberg 2007.

Estimating clustering indexes in data streams / Luciana S., Buriol; Gereon, Frahling; Leonardi, Stefano; Christian, Sohler. - 4698 LNCS:(2007), pp. 618-632. ( 15th Annual European Symposium on Algorithms, ESA 2007 Eilat; Israel 8 October 2007 through 10 October 2007) [10.1007/978-3-540-75520-3_55].

Estimating clustering indexes in data streams

Luciana S. Buriol;Gereon Frahling;LEONARDI, Stefano;Christian Sohler

2007

Abstract

We present random sampling algorithms that with probability at least 1 - δ compute a (1 ± ε)-approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges. The space used by our algorithm to estimate the clustering coefficient is inversely related to the clustering coefficient of the network itself. The space used by our algorithm to compute the number K 3,3 of bipartite cliques is proportional to the ratio between the number of K1,3 and K3,3 in the graph. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size. Therefore they provide a basic tool to analyze the structure of dense clusters in large graphs and have many applications in the discovery of web communities, the analysis of the structure of large social networks and the probing of frequent patterns in large graphs. We implemented both algorithms and evaluated their performance on networks from different application domains and of different size; The largest instance is a webgraph consisting of more than 135 million nodes and 1 billion edges. Both algorithms compute accurate results in reasonable time on the tested instances. © Springer-Verlag Berlin Heidelberg 2007.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2007
			
	Nome convegno
	
				15th Annual European Symposium on Algorithms, ESA 2007
			
	Parole chiave
	
				Data streams; Subgraphs; Web communities
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Estimating clustering indexes in data streams / Luciana S., Buriol; Gereon, Frahling; Leonardi, Stefano; Christian, Sohler. - 4698 LNCS:(2007), pp. 618-632. ( 15th Annual European Symposium on Algorithms, ESA 2007 Eilat; Israel 8 October 2007 through 10 October 2007) [10.1007/978-3-540-75520-3_55].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/60800

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

19

15

Catalogo dei prodotti della ricerca