Great effort has been devoted to the design of optimized checkpointing strategies for optimistic parallel discrete event simulators. On the other hand there is less work in the direction to improve the execution mode of any single checkpoint operation. Specifically, checkpoint operations are typically charged to the CPU, thus leading to freezing of the simulation application while checkpointing is in progress, i.e. the execution mode of the checkpointing protocol is typically synchronous. In this paper we focus on improvements of the execution mode and present a software architecture, designed for myrinet based Network of Workstations (NOWs), to avoid application freezing during any checkpoint operation, thus moving the execution itself towards an asynchronous mode. This is done by charging checkpoint operations to a hardware component distinct from the CPU, namely a DMA engine. On the other hand, totally asynchronous checkpointing could suffer from data inconsistency whenever the content of a state buffer is accessed for further modifications while a checkpoint operation involving it is not yet completed. To avoid this, the architecture includes functionalities for resynchronization on demand. We have used these functionalities to implement an execution mode of the checkpointing protocol we refer to as semi-asynchronous. By the results of all experimental study we argue that the semi-asynchronous mode can be an effective solution to almost completely remove the delay associated with any checkpoint operation from the completion time of the simulation.

Semi-asynchronous checkpointing for optimistic simulation on a myrinet based NOW / Quaglia, Francesco; A., Santoro. - (2001), pp. 56-63. (Intervento presentato al convegno 15th Workshop on Parallel and Distributed Simulation (PADS 2001) tenutosi a Lake Arrowhead, CA nel MAY 15-18, 2001) [10.1145/375658.375675].

Semi-asynchronous checkpointing for optimistic simulation on a myrinet based NOW

QUAGLIA, Francesco;
2001

Abstract

Great effort has been devoted to the design of optimized checkpointing strategies for optimistic parallel discrete event simulators. On the other hand there is less work in the direction to improve the execution mode of any single checkpoint operation. Specifically, checkpoint operations are typically charged to the CPU, thus leading to freezing of the simulation application while checkpointing is in progress, i.e. the execution mode of the checkpointing protocol is typically synchronous. In this paper we focus on improvements of the execution mode and present a software architecture, designed for myrinet based Network of Workstations (NOWs), to avoid application freezing during any checkpoint operation, thus moving the execution itself towards an asynchronous mode. This is done by charging checkpoint operations to a hardware component distinct from the CPU, namely a DMA engine. On the other hand, totally asynchronous checkpointing could suffer from data inconsistency whenever the content of a state buffer is accessed for further modifications while a checkpoint operation involving it is not yet completed. To avoid this, the architecture includes functionalities for resynchronization on demand. We have used these functionalities to implement an execution mode of the checkpointing protocol we refer to as semi-asynchronous. By the results of all experimental study we argue that the semi-asynchronous mode can be an effective solution to almost completely remove the delay associated with any checkpoint operation from the completion time of the simulation.
2001
9780769511047
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/61710
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact