Multi-objective autotuning of MobileNets across the full software/hardware stack

Lokhmotov, Anton; Chunosov, Nikolay; Vella, Flavio; Fursin, Grigori

doi:10.1145/3229762.3229767

We present a customizable Collective Knowledge workflow to study the execution time vs. accuracy trade-offs for the MobileNets CNN family. We use this workflow to evaluate MobileNets on Arm Cortex CPUs using TensorFlow and Arm Mali GPUs using several versions of the Arm Compute Library. Our optimizations for the Arm Bifrost GPU architecture reduce the execution time by 2--3 times, while lying on a Pareto-optimal frontier. We also highlight the challenge of maintaining the accuracy when deploying CNN models across diverse platforms. We make all the workflow components (models, programs, scripts, etc.) publicly available to encourage further exploration by the community.

Multi-objective autotuning of MobileNets across the full software/hardware stack / Lokhmotov, Anton; Chunosov, Nikolay; Vella, Flavio; Fursin, Grigori. - (2018). (Intervento presentato al convegno 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning, ReQuEST@ASPLOS 2018 tenutosi a Williamsburg, US) [10.1145/3229762.3229767].

Multi-objective autotuning of MobileNets across the full software/hardware stack

Lokhmotov, Anton;Chunosov, Nikolay;Vella, Flavio;Fursin, Grigori

2018

Abstract

We present a customizable Collective Knowledge workflow to study the execution time vs. accuracy trade-offs for the MobileNets CNN family. We use this workflow to evaluate MobileNets on Arm Cortex CPUs using TensorFlow and Arm Mali GPUs using several versions of the Arm Compute Library. Our optimizations for the Arm Bifrost GPU architecture reduce the execution time by 2--3 times, while lying on a Pareto-optimal frontier. We also highlight the challenge of maintaining the accuracy when deploying CNN models across diverse platforms. We make all the workflow components (models, programs, scripts, etc.) publicly available to encourage further exploration by the community.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2018
			
	Nome convegno
	
				1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning, ReQuEST@ASPLOS 2018
			
	Parole chiave
	
				Machine Learning; Deep Learning; Inference; Parallel Computing; GPU Programming;
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Multi-objective autotuning of MobileNets across the full software/hardware stack / Lokhmotov, Anton; Chunosov, Nikolay; Vella, Flavio; Fursin, Grigori. - (2018). (Intervento presentato al  convegno 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning, ReQuEST@ASPLOS 2018 tenutosi a Williamsburg, US) [10.1145/3229762.3229767].

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1213544

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

9

ND

Catalogo dei prodotti della ricerca