The low probability of single event upsets (SEU) within particular satellite orbits, makes Commercial-off-the-shelf (COTS) electronic components a viable solution for space system implementation, thanks to the introduction of design-level fault tolerance techniques at the expense of some performance/energy/area penalty. This paper illustrates the design and validation of a novel RISC-V dual-core architecture, based on a computing paradigm that we refer to as full/partial heterogeneous multi-core protection. The approach relies on a small, low-performance, fully fault-tolerant core (LP core) coupled with a high-performance partially fault-tolerant core (HP core). The computing paradigm assumes the failure-exposed HP core executes computation intensive routines for relatively short periods of time, making the occurrence of failures a statistically unlikely situation, while the fully fault-tolerant LP core operates in critical control tasks and manages the failure recovery of the high-performance core. The execution time percentage in the LP core varies from a minimum of 11.4% up to a maximum of 91.3% while in the HP core it is between 8.7% and 88.6%, depending on the application. In the proposed study, both the cores belong to the RISC-V compliant Klessydra core family. The dual-core architecture also includes a watchdog timer controlled by the LP core and monitoring the non-protected HP core, and a context switch FIFO that speeds up the code and data switch between the two cores during failure recovery. A dedicated run-time software environment coordinates the execution of tasks on the high-performance core in a resilient fashion. The dual-core processor has been validated through extensive RTL simulations running in an UVM-based fault-injection environment, which emulates SEUs at various rates. Experimental results illustrate the benefits and limits obtained by using a heterogeneous architecture with different levels of protection and performance. The failure probability assuming a SEU fault occurrence can be reduced by a factor between 10X and 30X with respect to the non-protected architecture, leading to an average failure rate of up to 4.00E-06 per second with respect to 1.80E-05 per second in the non-protected architecture.

A RISC-V fault-tolerant soft-processor based on full/partial heterogeneous dual-core protection / Vigli, Francesco; Barbirotta, Marcello; Cheikh, Abdallah; Menichelli, Francesco; Mastrandrea, Antonio; Olivieri, Mauro. - In: IEEE ACCESS. - ISSN 2169-3536. - 12:(2024), pp. 30495-30506. [10.1109/access.2024.3366806]

A RISC-V fault-tolerant soft-processor based on full/partial heterogeneous dual-core protection

Vigli, Francesco
Primo
;
Barbirotta, Marcello
Secondo
;
Cheikh, Abdallah;Menichelli, Francesco;Mastrandrea, Antonio
Penultimo
;
Olivieri, Mauro
Ultimo
2024

Abstract

The low probability of single event upsets (SEU) within particular satellite orbits, makes Commercial-off-the-shelf (COTS) electronic components a viable solution for space system implementation, thanks to the introduction of design-level fault tolerance techniques at the expense of some performance/energy/area penalty. This paper illustrates the design and validation of a novel RISC-V dual-core architecture, based on a computing paradigm that we refer to as full/partial heterogeneous multi-core protection. The approach relies on a small, low-performance, fully fault-tolerant core (LP core) coupled with a high-performance partially fault-tolerant core (HP core). The computing paradigm assumes the failure-exposed HP core executes computation intensive routines for relatively short periods of time, making the occurrence of failures a statistically unlikely situation, while the fully fault-tolerant LP core operates in critical control tasks and manages the failure recovery of the high-performance core. The execution time percentage in the LP core varies from a minimum of 11.4% up to a maximum of 91.3% while in the HP core it is between 8.7% and 88.6%, depending on the application. In the proposed study, both the cores belong to the RISC-V compliant Klessydra core family. The dual-core architecture also includes a watchdog timer controlled by the LP core and monitoring the non-protected HP core, and a context switch FIFO that speeds up the code and data switch between the two cores during failure recovery. A dedicated run-time software environment coordinates the execution of tasks on the high-performance core in a resilient fashion. The dual-core processor has been validated through extensive RTL simulations running in an UVM-based fault-injection environment, which emulates SEUs at various rates. Experimental results illustrate the benefits and limits obtained by using a heterogeneous architecture with different levels of protection and performance. The failure probability assuming a SEU fault occurrence can be reduced by a factor between 10X and 30X with respect to the non-protected architecture, leading to an average failure rate of up to 4.00E-06 per second with respect to 1.80E-05 per second in the non-protected architecture.
2024
processor architecture; fault-tolerance; multi-core; RISC-V; interleaved multi-threading; heterogeneous computing; single event effects
01 Pubblicazione su rivista::01a Articolo in rivista
A RISC-V fault-tolerant soft-processor based on full/partial heterogeneous dual-core protection / Vigli, Francesco; Barbirotta, Marcello; Cheikh, Abdallah; Menichelli, Francesco; Mastrandrea, Antonio; Olivieri, Mauro. - In: IEEE ACCESS. - ISSN 2169-3536. - 12:(2024), pp. 30495-30506. [10.1109/access.2024.3366806]
File allegati a questo prodotto
File Dimensione Formato  
Vigli_RISC-V_2024.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 4.74 MB
Formato Adobe PDF
4.74 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1722550
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact