Modern deep learning typically treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies an alternative paradigm, model merging: combining independently trained neural networks into a single model directly in weight space, without access to additional training data and with little or no optimization. The thesis is organized around two regimes. In the single-task setting, where models share a common objective but differ in initialization, we introduce C²M³, a cycle-consistent merging algorithm grounded in Frank-Wolfe optimization. C²M³ aligns collections of networks into a shared parameter space that serves as a reference-free aggregation point, making weight averaging meaningful without privileging any one model as the anchor. In the multi-task setting, where models are fine-tuned for distinct downstream tasks, we first develop a theoretical account of task vectors, the parameter differences between a fine-tuned model and its pretrained initialization. We show that task vectors admit a gradient-based interpretation under standard assumptions, clarifying both the success and the limits of task arithmetic. This gradient view has a direct consequence: gradients are known to exhibit low-rank structure, and task vectors inherit this property. We formalize and exploit this low-rank structure through Task Singular Vectors (TSV), a decomposition that supports both model compression and interference reduction in TSV-Merge. We then present MASS, an input-adaptive routing mechanism that uses TSV geometry to direct inference through task-relevant subspaces. Finally, we introduce MERGE³, an evolutionary merging framework that incorporates Item Response Theory to reduce evaluation costs by up to 50× while preserving solution quality. Taken together, these contributions place model merging on firmer theoretical and algorithmic foundations, advancing a paradigm in which learned capabilities can be composed, reused, and extended across models.

Model merging: foundations and algorithms / Crisostomi, Donato. - (2026 May 11).

Model merging: foundations and algorithms

CRISOSTOMI, DONATO
11/05/2026

Abstract

Modern deep learning typically treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies an alternative paradigm, model merging: combining independently trained neural networks into a single model directly in weight space, without access to additional training data and with little or no optimization. The thesis is organized around two regimes. In the single-task setting, where models share a common objective but differ in initialization, we introduce C²M³, a cycle-consistent merging algorithm grounded in Frank-Wolfe optimization. C²M³ aligns collections of networks into a shared parameter space that serves as a reference-free aggregation point, making weight averaging meaningful without privileging any one model as the anchor. In the multi-task setting, where models are fine-tuned for distinct downstream tasks, we first develop a theoretical account of task vectors, the parameter differences between a fine-tuned model and its pretrained initialization. We show that task vectors admit a gradient-based interpretation under standard assumptions, clarifying both the success and the limits of task arithmetic. This gradient view has a direct consequence: gradients are known to exhibit low-rank structure, and task vectors inherit this property. We formalize and exploit this low-rank structure through Task Singular Vectors (TSV), a decomposition that supports both model compression and interference reduction in TSV-Merge. We then present MASS, an input-adaptive routing mechanism that uses TSV geometry to direct inference through task-relevant subspaces. Finally, we introduce MERGE³, an evolutionary merging framework that incorporates Item Response Theory to reduce evaluation costs by up to 50× while preserving solution quality. Taken together, these contributions place model merging on firmer theoretical and algorithmic foundations, advancing a paradigm in which learned capabilities can be composed, reused, and extended across models.
11-mag-2026
LIO', PIETRO
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Crisostomi.pdf

accesso aperto

Note: tesi completa
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 18.88 MB
Formato Adobe PDF
18.88 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1767973
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact