The number of IoT devices has greatly increased over the years, so that they have invaded the electronic market. IoT describe a device-to-device communication without human interface. A large class of these devices are battery powered, and the energy consumption inside them is considered critical. Today’s embedded IoT systems interface multiple peripherals such as sensors that perform continuous monitoring of the environment around it, and actuators that are controlled by the embedded systems. Also, they interface wireless devices for data transmissions. A part of their job includes some basic pre-processing of the data before transmitting it over those wireless networks. Such pre-processing “on the edge of the network” minimizes the data to be transmitted over the wireless channels, and only transmits the desired outputs. In front of the increase demand to support pre-processing, such as computer vision and voice recognition, on small embedded systems on the edge of the network, they cannot completely satisfy those demands due to their little performance In this study we demonstrate the performance and energy efficiency of interleaved multithreaded architectures, which can be used in an embedded system on the edge of the IoT interfacing multiple sensors and peripherals, each serviced by a different hardware thread. We show the optimal pipeline organization to use in such architectures, and we finally demonstrate how these architectures can be exploited to easily improve instruction level parallelism by integrating a convolutional neural networking accelerator that can perform very fast vector arithmetic operations, and finally benchmarking this accelerator by running a custom implementation of the VGG16 convolutional neural network. The microprocessors presented are a part of a family of processing cores called Klessydra. The Klessydra microprocessors were written such that they have a pinout that are 100 percent identical with Riscy cores from PULPino SoC. The subset of the Klessydra cores presented in this thesis is called the Klessydra-T. The letter ‘T’ indicating that the cores are multithreaded, the Klessydra-T subset has two main implementations used throughout this thesis, they are Klessydra-T03 and Klessydra-T13. T03 and T13 for short. The processor cores have been tested with the Modelsim / Questasim simulators. The cores have been synthesized on the 7-series FPGAs from Xilinx with the Vivado Synthesis tool. Synthesis and Post-synthesis simulations have been made. Dynamic Power estimations were calculated by Vivado from the power report generated by Modelsim after having simulated a post-synthesis Vivado netlist. FPGA synthesis was chosen as our target implementation, as they provide high reconfigurability, which allows the user to easily customize their own accelerator and make it adapt accordingly to their specific applications. In our assessment throughout this thesis we nominated the T03 interleaved multithreaded processor as our optimal and most balanced pipeline organization. The T03 core had many advantages over other architectures, however it was only suitable to be used in control applications. T13 solves this problem by implementing superscalar hardware accelerators. A hybrid implementation of the hardware accelerator targeting thread level parallelism and slight data level parallelism was the approach yielding the highest performance and still maintaining a relatively low energy consumption for energy critical environments.

Energy-efficient digital electronic systems design for edge-computing applications, through innovative RISC-V compliant processors / Cheikh, Abdallah. - (2020 Feb 18).

Energy-efficient digital electronic systems design for edge-computing applications, through innovative RISC-V compliant processors

CHEIKH, ABDALLAH
18/02/2020

Abstract

The number of IoT devices has greatly increased over the years, so that they have invaded the electronic market. IoT describe a device-to-device communication without human interface. A large class of these devices are battery powered, and the energy consumption inside them is considered critical. Today’s embedded IoT systems interface multiple peripherals such as sensors that perform continuous monitoring of the environment around it, and actuators that are controlled by the embedded systems. Also, they interface wireless devices for data transmissions. A part of their job includes some basic pre-processing of the data before transmitting it over those wireless networks. Such pre-processing “on the edge of the network” minimizes the data to be transmitted over the wireless channels, and only transmits the desired outputs. In front of the increase demand to support pre-processing, such as computer vision and voice recognition, on small embedded systems on the edge of the network, they cannot completely satisfy those demands due to their little performance In this study we demonstrate the performance and energy efficiency of interleaved multithreaded architectures, which can be used in an embedded system on the edge of the IoT interfacing multiple sensors and peripherals, each serviced by a different hardware thread. We show the optimal pipeline organization to use in such architectures, and we finally demonstrate how these architectures can be exploited to easily improve instruction level parallelism by integrating a convolutional neural networking accelerator that can perform very fast vector arithmetic operations, and finally benchmarking this accelerator by running a custom implementation of the VGG16 convolutional neural network. The microprocessors presented are a part of a family of processing cores called Klessydra. The Klessydra microprocessors were written such that they have a pinout that are 100 percent identical with Riscy cores from PULPino SoC. The subset of the Klessydra cores presented in this thesis is called the Klessydra-T. The letter ‘T’ indicating that the cores are multithreaded, the Klessydra-T subset has two main implementations used throughout this thesis, they are Klessydra-T03 and Klessydra-T13. T03 and T13 for short. The processor cores have been tested with the Modelsim / Questasim simulators. The cores have been synthesized on the 7-series FPGAs from Xilinx with the Vivado Synthesis tool. Synthesis and Post-synthesis simulations have been made. Dynamic Power estimations were calculated by Vivado from the power report generated by Modelsim after having simulated a post-synthesis Vivado netlist. FPGA synthesis was chosen as our target implementation, as they provide high reconfigurability, which allows the user to easily customize their own accelerator and make it adapt accordingly to their specific applications. In our assessment throughout this thesis we nominated the T03 interleaved multithreaded processor as our optimal and most balanced pipeline organization. The T03 core had many advantages over other architectures, however it was only suitable to be used in control applications. T13 solves this problem by implementing superscalar hardware accelerators. A hybrid implementation of the hardware accelerator targeting thread level parallelism and slight data level parallelism was the approach yielding the highest performance and still maintaining a relatively low energy consumption for energy critical environments.
18-feb-2020
File allegati a questo prodotto
File Dimensione Formato  
Tes_dottorato_Cheikh.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 5.36 MB
Formato Adobe PDF
5.36 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1363198
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact