In this paper, we present a novel data-free method for merging neural networks in weight space. Our method optimizes for the permutations of network neurons while ensuring global coherence across all layers, and it outperforms recent layer-local approaches in a set of challenging scenarios. We then generalize the formulation to the -models scenario to enforce cycle consistency of the permutations with guarantees, allowing circular compositions of permutations to be computed without accumulating error along the path. We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging homogeneous sets of models in scenarios spanning varying architectures and datasets. We finally show that, when coupled with activation renormalization, the approach yields the best results in the task.

C2M3: Cycle-Consistent Multi-Model Merging / Crisostomi, Donato; Fumero, Marco; Baieri, Daniele; Bernard, Florian; Rodola, Emanuele. - (2024). (Intervento presentato al convegno Thirty-eighth Annual Conference on Neural Information Processing Systems tenutosi a Vancouver, Canada).

C2M3: Cycle-Consistent Multi-Model Merging

Donato Crisostomi
;
Marco Fumero;Daniele Baieri;Emanuele Rodola
2024

Abstract

In this paper, we present a novel data-free method for merging neural networks in weight space. Our method optimizes for the permutations of network neurons while ensuring global coherence across all layers, and it outperforms recent layer-local approaches in a set of challenging scenarios. We then generalize the formulation to the -models scenario to enforce cycle consistency of the permutations with guarantees, allowing circular compositions of permutations to be computed without accumulating error along the path. We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging homogeneous sets of models in scenarios spanning varying architectures and datasets. We finally show that, when coupled with activation renormalization, the approach yields the best results in the task.
2024
Thirty-eighth Annual Conference on Neural Information Processing Systems
model merging, neural networks, cycle consistency, frank-wolfe, matching
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
C2M3: Cycle-Consistent Multi-Model Merging / Crisostomi, Donato; Fumero, Marco; Baieri, Daniele; Bernard, Florian; Rodola, Emanuele. - (2024). (Intervento presentato al convegno Thirty-eighth Annual Conference on Neural Information Processing Systems tenutosi a Vancouver, Canada).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1726455
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact