We propose a rigorous and efficient method for evaluating homophily and heterophily in edge-weighted networks. In a network with nodes partitioned into classes, homophily (resp., heterophily) is defined as the tendency to have edges between nodes in the same class (resp., in different classes). Assuming a suitable null model, we provide a closed formula for the z-score of the total weight of homophilic/heterophilic edges for each class/pair of classes. The z-score directly measures how much this weight deviates from its expected value under the null model. In addition, we also propose a global homophily measure, that gives a significant score of how the set of all classes at a glance tend to be homophilic. The proposed statistics can be computed for very large networks since, as we show, they can be efficiently computed in a data streaming setting. For a network with n nodes and m edges, our algorithm only needs O(n) internal memory space, optimal O(m) worst case time, and a single scan of the m input edges, in any order, is required. Experimental results are shown on ten Protein-Protein Interaction networks, reporting homophily w.r.t. protein functional classes.
Homophily of Large Weighted Networks in a Data Streaming Setting / Apollonio, Nicola; Franciosa, Paolo G.; Santoni, Daniele. - (2025), pp. 131-142. ( Computational Intelligence Methods for Bioinformatics and Biostatistics Padova, Italy ) [10.1007/978-3-031-90714-2_10].
Homophily of Large Weighted Networks in a Data Streaming Setting
Franciosa, Paolo G.
Membro del Collaboration Group
;
2025
Abstract
We propose a rigorous and efficient method for evaluating homophily and heterophily in edge-weighted networks. In a network with nodes partitioned into classes, homophily (resp., heterophily) is defined as the tendency to have edges between nodes in the same class (resp., in different classes). Assuming a suitable null model, we provide a closed formula for the z-score of the total weight of homophilic/heterophilic edges for each class/pair of classes. The z-score directly measures how much this weight deviates from its expected value under the null model. In addition, we also propose a global homophily measure, that gives a significant score of how the set of all classes at a glance tend to be homophilic. The proposed statistics can be computed for very large networks since, as we show, they can be efficiently computed in a data streaming setting. For a network with n nodes and m edges, our algorithm only needs O(n) internal memory space, optimal O(m) worst case time, and a single scan of the m input edges, in any order, is required. Experimental results are shown on ten Protein-Protein Interaction networks, reporting homophily w.r.t. protein functional classes.| File | Dimensione | Formato | |
|---|---|---|---|
|
Apollonio_homophily_2025.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.54 MB
Formato
Adobe PDF
|
1.54 MB | Adobe PDF | Contatta l'autore |
|
Apollonio_homophily_2025.pdf.pdf
accesso aperto
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
584.05 kB
Formato
Adobe PDF
|
584.05 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


