Homophily is the principle whereby “similarity breeds connections.” We give a quantitative formulation of this principle within networks. Given a network and a labeled partition of its vertices, the vector indexed by each class of the partition, whose entries are the number of edges of the subgraphs induced by the corresponding classes, is viewed as the observed outcome of the random vector described by picking labeled partitions at random among labeled partitions whose classes have the same cardinalities as the given one. This is the recently introduced random coloring model for network homophily. In this perspective, the value of any homophily score Θ, namely, a nondecreasing real-valued function in the sizes of subgraphs induced by the classes of the partition, evaluated at the observed outcome, can be thought of as the observed value of a random variable. Consequently, according to the score Θ, the input network is homophillic at the significance level α whenever the one-sided tail probability of observing a value of Θ at least as extreme as the observed one is smaller than α. Since, as we show, even approximating α is an NP-hard problem, we resort to classical tails inequality to bound α from above. These upper bounds, obtained by specializing Θ, yield a class of quantifiers of network homophily. Computing the upper bounds requires the knowledge of the covariance matrix of the random vector, which was not previously known within the random coloring model. In this paper we close this gap. Interestingly, the matrix depends on the input partition only through the cardinalities of its classes and depends on the network only through its degrees. Furthermore all the covariances have the same sign, and this sign is a graph invariant. Plugging this structure into the bounds yields a meaningful, easy to compute class of indices for measuring network homophily. As demonstrated in real-world network applications, these indices are effective and reliable, and may lead to discoveries that cannot be captured by the current state of the art.

Network homophily via tail inequalities / Apollonio, Nicola; Franciosa, Paolo G.; Santoni, Daniele. - In: PHYSICAL REVIEW. E. - ISSN 2470-0045. - 108:5(2023), pp. 1-11. [10.1103/PhysRevE.108.054130]

Network homophily via tail inequalities

Paolo G. Franciosa
Membro del Collaboration Group
;
2023

Abstract

Homophily is the principle whereby “similarity breeds connections.” We give a quantitative formulation of this principle within networks. Given a network and a labeled partition of its vertices, the vector indexed by each class of the partition, whose entries are the number of edges of the subgraphs induced by the corresponding classes, is viewed as the observed outcome of the random vector described by picking labeled partitions at random among labeled partitions whose classes have the same cardinalities as the given one. This is the recently introduced random coloring model for network homophily. In this perspective, the value of any homophily score Θ, namely, a nondecreasing real-valued function in the sizes of subgraphs induced by the classes of the partition, evaluated at the observed outcome, can be thought of as the observed value of a random variable. Consequently, according to the score Θ, the input network is homophillic at the significance level α whenever the one-sided tail probability of observing a value of Θ at least as extreme as the observed one is smaller than α. Since, as we show, even approximating α is an NP-hard problem, we resort to classical tails inequality to bound α from above. These upper bounds, obtained by specializing Θ, yield a class of quantifiers of network homophily. Computing the upper bounds requires the knowledge of the covariance matrix of the random vector, which was not previously known within the random coloring model. In this paper we close this gap. Interestingly, the matrix depends on the input partition only through the cardinalities of its classes and depends on the network only through its degrees. Furthermore all the covariances have the same sign, and this sign is a graph invariant. Plugging this structure into the bounds yields a meaningful, easy to compute class of indices for measuring network homophily. As demonstrated in real-world network applications, these indices are effective and reliable, and may lead to discoveries that cannot be captured by the current state of the art.
2023
network homophily; Mahalanobis norm; tail inequalities; graph partitioning; graph invariant; over-dispersed degree distributions
01 Pubblicazione su rivista::01a Articolo in rivista
Network homophily via tail inequalities / Apollonio, Nicola; Franciosa, Paolo G.; Santoni, Daniele. - In: PHYSICAL REVIEW. E. - ISSN 2470-0045. - 108:5(2023), pp. 1-11. [10.1103/PhysRevE.108.054130]
File allegati a questo prodotto
File Dimensione Formato  
Apollonio_network-homophily_2023.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 297.2 kB
Formato Adobe PDF
297.2 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1696572
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact