Catalogo dei prodotti della ricerca

Modern deep learning typically treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies an alternative paradigm, model merging: combining independently trained neural networks into a single model directly in weight space, without access to additional training data and with little or no optimization. The thesis is organized around two regimes. In the single-task setting, where models share a common objective but differ in initialization, we introduce C²M³, a cycle-consistent merging algorithm grounded in Frank-Wolfe optimization. C²M³ aligns collections of networks into a shared parameter space that serves as a reference-free aggregation point, making weight averaging meaningful without privileging any one model as the anchor. In the multi-task setting, where models are fine-tuned for distinct downstream tasks, we first develop a theoretical account of task vectors, the parameter differences between a fine-tuned model and its pretrained initialization. We show that task vectors admit a gradient-based interpretation under standard assumptions, clarifying both the success and the limits of task arithmetic. This gradient view has a direct consequence: gradients are known to exhibit low-rank structure, and task vectors inherit this property. We formalize and exploit this low-rank structure through Task Singular Vectors (TSV), a decomposition that supports both model compression and interference reduction in TSV-Merge. We then present MASS, an input-adaptive routing mechanism that uses TSV geometry to direct inference through task-relevant subspaces. Finally, we introduce MERGE³, an evolutionary merging framework that incorporates Item Response Theory to reduce evaluation costs by up to 50× while preserving solution quality. Taken together, these contributions place model merging on firmer theoretical and algorithmic foundations, advancing a paradigm in which learned capabilities can be composed, reused, and extended across models.

Model merging: foundations and algorithms / Crisostomi, Donato. - (2026 May 11).

Model merging: foundations and algorithms

CRISOSTOMI, DONATO

11/05/2026

Abstract

Modern deep learning typically treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies an alternative paradigm, model merging: combining independently trained neural networks into a single model directly in weight space, without access to additional training data and with little or no optimization. The thesis is organized around two regimes. In the single-task setting, where models share a common objective but differ in initialization, we introduce C²M³, a cycle-consistent merging algorithm grounded in Frank-Wolfe optimization. C²M³ aligns collections of networks into a shared parameter space that serves as a reference-free aggregation point, making weight averaging meaningful without privileging any one model as the anchor. In the multi-task setting, where models are fine-tuned for distinct downstream tasks, we first develop a theoretical account of task vectors, the parameter differences between a fine-tuned model and its pretrained initialization. We show that task vectors admit a gradient-based interpretation under standard assumptions, clarifying both the success and the limits of task arithmetic. This gradient view has a direct consequence: gradients are known to exhibit low-rank structure, and task vectors inherit this property. We formalize and exploit this low-rank structure through Task Singular Vectors (TSV), a decomposition that supports both model compression and interference reduction in TSV-Merge. We then present MASS, an input-adaptive routing mechanism that uses TSV geometry to direct inference through task-relevant subspaces. Finally, we introduce MERGE³, an evolutionary merging framework that incorporates Item Response Theory to reduce evaluation costs by up to 50× while preserving solution quality. Taken together, these contributions place model merging on firmer theoretical and algorithmic foundations, advancing a paradigm in which learned capabilities can be composed, reused, and extended across models.

Scheda breve

Scheda completa

	Data di discussione
	
				11-mag-2026
			
	Tutor esterni
	
				LIO', PIETRO
			
	Appartiene alla tipologia:
	
				07a Tesi di Dottorato

File allegati a questo prodotto

File	Dimensione	Formato
Tesi_dottorato_Crisostomi.pdf accesso aperto Note: tesi completa Tipologia: Tesi di dottorato Licenza: Creative commons Dimensione 18.88 MB Formato Adobe PDF	18.88 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1767973

Citazioni

ND

ND

ND

social impact