Modern deep learning typically treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies an alternative paradigm, model merging: combining independently trained neural networks into a single model directly in weight space, without access to additional training data and with little or no optimization. The thesis is organized around two regimes. In the single-task setting, where models share a common objective but differ in initialization, we introduce C²M³, a cycle-consistent merging algorithm grounded in Frank-Wolfe optimization. C²M³ aligns collections of networks into a shared parameter space that serves as a reference-free aggregation point, making weight averaging meaningful without privileging any one model as the anchor. In the multi-task setting, where models are fine-tuned for distinct downstream tasks, we first develop a theoretical account of task vectors, the parameter differences between a fine-tuned model and its pretrained initialization. We show that task vectors admit a gradient-based interpretation under standard assumptions, clarifying both the success and the limits of task arithmetic. This gradient view has a direct consequence: gradients are known to exhibit low-rank structure, and task vectors inherit this property. We formalize and exploit this low-rank structure through Task Singular Vectors (TSV), a decomposition that supports both model compression and interference reduction in TSV-Merge. We then present MASS, an input-adaptive routing mechanism that uses TSV geometry to direct inference through task-relevant subspaces. Finally, we introduce MERGE³, an evolutionary merging framework that incorporates Item Response Theory to reduce evaluation costs by up to 50× while preserving solution quality. Taken together, these contributions place model merging on firmer theoretical and algorithmic foundations, advancing a paradigm in which learned capabilities can be composed, reused, and extended across models.
Model merging: foundations and algorithms / Crisostomi, Donato. - (2026 May 11).
Model merging: foundations and algorithms
CRISOSTOMI, DONATO
11/05/2026
Abstract
Modern deep learning typically treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies an alternative paradigm, model merging: combining independently trained neural networks into a single model directly in weight space, without access to additional training data and with little or no optimization. The thesis is organized around two regimes. In the single-task setting, where models share a common objective but differ in initialization, we introduce C²M³, a cycle-consistent merging algorithm grounded in Frank-Wolfe optimization. C²M³ aligns collections of networks into a shared parameter space that serves as a reference-free aggregation point, making weight averaging meaningful without privileging any one model as the anchor. In the multi-task setting, where models are fine-tuned for distinct downstream tasks, we first develop a theoretical account of task vectors, the parameter differences between a fine-tuned model and its pretrained initialization. We show that task vectors admit a gradient-based interpretation under standard assumptions, clarifying both the success and the limits of task arithmetic. This gradient view has a direct consequence: gradients are known to exhibit low-rank structure, and task vectors inherit this property. We formalize and exploit this low-rank structure through Task Singular Vectors (TSV), a decomposition that supports both model compression and interference reduction in TSV-Merge. We then present MASS, an input-adaptive routing mechanism that uses TSV geometry to direct inference through task-relevant subspaces. Finally, we introduce MERGE³, an evolutionary merging framework that incorporates Item Response Theory to reduce evaluation costs by up to 50× while preserving solution quality. Taken together, these contributions place model merging on firmer theoretical and algorithmic foundations, advancing a paradigm in which learned capabilities can be composed, reused, and extended across models.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi_dottorato_Crisostomi.pdf
accesso aperto
Note: tesi completa
Tipologia:
Tesi di dottorato
Licenza:
Creative commons
Dimensione
18.88 MB
Formato
Adobe PDF
|
18.88 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


