The rapid advancement of Artificial Intelligence, particularly through deep learning and Large Language Models, has yielded unprecedented performance across diverse domains. However, this success has come at the cost of increasingly demanding computational requirements and diminished transparency. Modern AI systems rely heavily on over-parameterized architectures that, while powerful, are inefficient for deployment in resource-constrained environments and opaque in their decision-making processes, raising critical concerns for adoption in high-stakes applications. This thesis addresses the urgent need to reconcile the capabilities of state-of-the-art AI with the practical demands of real-world deployment by developing systems that are simultaneously adaptive, efficient, and interpretable. Through three interconnected research themes, we advance the foundations of AI architectures. First, we establish a comprehensive framework for adaptive computation, introducing mechanisms that enable neural networks to dynamically allocate computational resources based on input complexity and available resources. Our contributions include Adaptive Computation Modules for granular per-token efficiency, adaptive layer selection for accelerated fine-tuning of Vision Transformers, and adaptive semantic token selection strategies for edge intelligence systems. Second, we tackle critical efficiency bottlenecks in Large Language Model deployment, specifically addressing the memory constraints imposed by the Key-Value cache. We propose multiple novel compression strategies—including L2 norm-based approaches, QFilters that exploit query-key geometrical relationships, and Expected Attention methods that leverage future query distributions—enabling practical deployment of LLMs with extended context windows. Third, we advance interpretable AI through mechanistic analysis of model internals and domain-specific applications of explainable AI. Our work encompasses steering knowledge selection behaviors in LLMs, analyzing residual streams under knowledge conflicts, and deploying interpretable models in high-stakes domains including high-energy physics and archaeological classification, demonstrating how transparency enables both trust and scientific insight. Collectively, this body of work demonstrates that efficiency and interpretability need not be traded against performance. By developing adaptive architectures, addressing specific deployment bottlenecks, and opening the black box of neural networks, this thesis provides both theoretical frameworks and practical solutions that move AI systems closer to widespread, trustworthy deployment across diverse real-world applications.
Adaptive and efficient neural architectures: from adaptive computation to interpretable AI systems / Devoto, Alessio. - (2026 Jan 28).
Adaptive and efficient neural architectures: from adaptive computation to interpretable AI systems
DEVOTO, ALESSIO
28/01/2026
Abstract
The rapid advancement of Artificial Intelligence, particularly through deep learning and Large Language Models, has yielded unprecedented performance across diverse domains. However, this success has come at the cost of increasingly demanding computational requirements and diminished transparency. Modern AI systems rely heavily on over-parameterized architectures that, while powerful, are inefficient for deployment in resource-constrained environments and opaque in their decision-making processes, raising critical concerns for adoption in high-stakes applications. This thesis addresses the urgent need to reconcile the capabilities of state-of-the-art AI with the practical demands of real-world deployment by developing systems that are simultaneously adaptive, efficient, and interpretable. Through three interconnected research themes, we advance the foundations of AI architectures. First, we establish a comprehensive framework for adaptive computation, introducing mechanisms that enable neural networks to dynamically allocate computational resources based on input complexity and available resources. Our contributions include Adaptive Computation Modules for granular per-token efficiency, adaptive layer selection for accelerated fine-tuning of Vision Transformers, and adaptive semantic token selection strategies for edge intelligence systems. Second, we tackle critical efficiency bottlenecks in Large Language Model deployment, specifically addressing the memory constraints imposed by the Key-Value cache. We propose multiple novel compression strategies—including L2 norm-based approaches, QFilters that exploit query-key geometrical relationships, and Expected Attention methods that leverage future query distributions—enabling practical deployment of LLMs with extended context windows. Third, we advance interpretable AI through mechanistic analysis of model internals and domain-specific applications of explainable AI. Our work encompasses steering knowledge selection behaviors in LLMs, analyzing residual streams under knowledge conflicts, and deploying interpretable models in high-stakes domains including high-energy physics and archaeological classification, demonstrating how transparency enables both trust and scientific insight. Collectively, this body of work demonstrates that efficiency and interpretability need not be traded against performance. By developing adaptive architectures, addressing specific deployment bottlenecks, and opening the black box of neural networks, this thesis provides both theoretical frameworks and practical solutions that move AI systems closer to widespread, trustworthy deployment across diverse real-world applications.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi_dottorato_Devoto.pdf
accesso aperto
Note: tesi completa
Tipologia:
Tesi di dottorato
Licenza:
Creative commons
Dimensione
49.52 MB
Formato
Adobe PDF
|
49.52 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


