Mining large networks with subgraph counting

Bordino, Ilaria; Donato, Debora; Gionis, Aristides; Leonardi, Stefano

doi:10.1109/icdm.2008.109

The problem of mining frequent patterns in networks has many applications, including analysis of complex networks, clustering of graphs, finding communities in social networks, and indexing of graphical and biological databases. Despite this wealth of applications, the current state of the art lacks algorithmic tools for counting the number of subgraphs contained in a large network. In this paper we develop data-stream algorithms that approximate the number of all subgraphs of three and four vertices in directed and undirected networks. We use the frequency of occurrence of all subgraphs to prove their significance in order to characterize different kinds of networks: we achieve very good precision in clustering networks with similar structure. The significance of our method is supported by the fact that such high precision cannot be achieved when performing clustering based on simpler topological properties, such as degree, assortativity, and eigenvector distributions. We have also tested our techniques using swap randomization. © 2008 IEEE.

Mining large networks with subgraph counting / Bordino, I., Debora, D., Aristides, G., Leonardi, S.. - (2008), pp. 737-742. (8th IEEE International Conference on Data Mining Pisa; Italy DEC 15-19, 2008) [10.1109/icdm.2008.109].

Mining large networks with subgraph counting

BORDINO, ILARIA;Debora Donato;Aristides Gionis;LEONARDI, Stefano

2008

Abstract

The problem of mining frequent patterns in networks has many applications, including analysis of complex networks, clustering of graphs, finding communities in social networks, and indexing of graphical and biological databases. Despite this wealth of applications, the current state of the art lacks algorithmic tools for counting the number of subgraphs contained in a large network. In this paper we develop data-stream algorithms that approximate the number of all subgraphs of three and four vertices in directed and undirected networks. We use the frequency of occurrence of all subgraphs to prove their significance in order to characterize different kinds of networks: we achieve very good precision in clustering networks with similar structure. The significance of our method is supported by the fact that such high precision cannot be achieved when performing clustering based on simpler topological properties, such as degree, assortativity, and eigenvector distributions. We have also tested our techniques using swap randomization. © 2008 IEEE.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2008
			
	Nome convegno
	
				8th IEEE International Conference on Data Mining
			
	Parole chiave
	
				Algorithmic tools; Biological database; Complex networks
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Mining large networks with subgraph counting / Bordino, I., Debora, D., Aristides, G., Leonardi, S.. - (2008), pp. 737-742. (8th IEEE International Conference on Data Mining Pisa; Italy DEC 15-19, 2008) [10.1109/icdm.2008.109].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
VE_2008_11573-366584.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 237.95 kB Formato Adobe PDF Contatta l'autore	237.95 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/366584

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

55

37

Catalogo dei prodotti della ricerca