DNNs, commonly employed for complex tasks such as image and language processing, are increasingly sought for deployment on Internet of Things (IoT) devices. These devices operate with constrained resources, including limited computational power, memory, slower processors, and restricted energy requirements. Consequently, optimizing DNN models becomes crucial to minimize memory usage and computational time. However, traditional optimization methods require skilled professionals to manually fine-tune hyperparameters, striking a balance between efficiency and accuracy. This paper introduces an innovative solution for identifying optimal hyperparameters, focusing on the application of pruning, clusterization, and quantization. Initial empirical analyses were conducted to understand the relationships between model size, accuracy, pruning rate, and the number of clusters. Building upon these findings, we developed a framework that proposes two algorithms: one for discovering optimal pruning and the second for determining the optimal number of clusters. Through the adoption of efficient algorithms and the best quantization configuration, our tool integrates an optimization procedure that successfully reduces model size and inference time. The optimized models generated exhibit results comparable to, and in some cases surpass, those of more complex state-of-the-art approaches. The framework successfully optimized ResNet50, reducing the model size by 6.35x with a speedup of 2.91x, while only sacrificing 0.87% of the original accuracy.

Targeted and Automatic Deep Neural Networks Optimization for Edge Computing / Giovannesi, Luca; Proietti Mattia, Gabriele; Beraldi, Roberto. - 203:(2024), pp. 57-68. (Intervento presentato al convegno International Conference on Advanced Information Networking and Applications (was ICOIN) tenutosi a Kitakyushu; Japan) [10.1007/978-3-031-57931-8_6].

Targeted and Automatic Deep Neural Networks Optimization for Edge Computing

Giovannesi, Luca
Primo
Software
;
Proietti Mattia, Gabriele
Secondo
Methodology
;
Beraldi, Roberto
Ultimo
Supervision
2024

Abstract

DNNs, commonly employed for complex tasks such as image and language processing, are increasingly sought for deployment on Internet of Things (IoT) devices. These devices operate with constrained resources, including limited computational power, memory, slower processors, and restricted energy requirements. Consequently, optimizing DNN models becomes crucial to minimize memory usage and computational time. However, traditional optimization methods require skilled professionals to manually fine-tune hyperparameters, striking a balance between efficiency and accuracy. This paper introduces an innovative solution for identifying optimal hyperparameters, focusing on the application of pruning, clusterization, and quantization. Initial empirical analyses were conducted to understand the relationships between model size, accuracy, pruning rate, and the number of clusters. Building upon these findings, we developed a framework that proposes two algorithms: one for discovering optimal pruning and the second for determining the optimal number of clusters. Through the adoption of efficient algorithms and the best quantization configuration, our tool integrates an optimization procedure that successfully reduces model size and inference time. The optimized models generated exhibit results comparable to, and in some cases surpass, those of more complex state-of-the-art approaches. The framework successfully optimized ResNet50, reducing the model size by 6.35x with a speedup of 2.91x, while only sacrificing 0.87% of the original accuracy.
2024
International Conference on Advanced Information Networking and Applications (was ICOIN)
Deep Neural Networks; DNN Acceleration; DNN Compression; Edge Computing
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Targeted and Automatic Deep Neural Networks Optimization for Edge Computing / Giovannesi, Luca; Proietti Mattia, Gabriele; Beraldi, Roberto. - 203:(2024), pp. 57-68. (Intervento presentato al convegno International Conference on Advanced Information Networking and Applications (was ICOIN) tenutosi a Kitakyushu; Japan) [10.1007/978-3-031-57931-8_6].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1709793
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact