Recently a Checkpointing and Communication Library (CCL) for optimistic simulation on Myrinet based Network of Workstations (NOWs) has been presented. CCL offloads checkpoint operations from the CPU by charging them to a programmable DMA engine on the Myrinet network card. CCL includes also functionalities for freezing the simulation application on demand, which can be used for data consistency maintenance (for example when a state buffer needs to be accessed for further modifications while a DMA based checkpoint operation involving it is still in progress). Programming the DMA to perform a checkpoint operation by transferring large data blocks in a single burst allows the latency of any checkpoint operation to be kept low. This reduces the probability for application freezing to really occur On the other hand, transferring large data blocks in a single burst might cause negative interference on communication since that DMA (and other circuitry) cannot be used for communication functionalities until the currently executed data transfer is not yet completed. In this paper we present a detailed identification of the effects of the burst length, from which we outline a set of relevant phenomena to take into account in order to determine a compile time suited value for the burst length itself. We also report measures quantifying these phenomena for the case of a PC cluster. Actually, the data indicate that communication functionalities do not suffer from the use of non-minimal burst lengths for checkpoint operations, thus pointing out how, if well tuned, CCL provides highly effective, CPU offloaded, checkpointing functionalities.

Tuning of the checkpointing and communication library for optimistic simulation on Myrinet based NOWs / F., Quaglia; A., Santoro; Ciciani, Bruno. - (2001), pp. 241-248. (Intervento presentato al convegno 9th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2001) tenutosi a Cincinnati, OH nel 15 August 2001 through 18 August 2001) [10.1109/MASCOT.2001.948874].

Tuning of the checkpointing and communication library for optimistic simulation on Myrinet based NOWs

CICIANI, Bruno
2001

Abstract

Recently a Checkpointing and Communication Library (CCL) for optimistic simulation on Myrinet based Network of Workstations (NOWs) has been presented. CCL offloads checkpoint operations from the CPU by charging them to a programmable DMA engine on the Myrinet network card. CCL includes also functionalities for freezing the simulation application on demand, which can be used for data consistency maintenance (for example when a state buffer needs to be accessed for further modifications while a DMA based checkpoint operation involving it is still in progress). Programming the DMA to perform a checkpoint operation by transferring large data blocks in a single burst allows the latency of any checkpoint operation to be kept low. This reduces the probability for application freezing to really occur On the other hand, transferring large data blocks in a single burst might cause negative interference on communication since that DMA (and other circuitry) cannot be used for communication functionalities until the currently executed data transfer is not yet completed. In this paper we present a detailed identification of the effects of the burst length, from which we outline a set of relevant phenomena to take into account in order to determine a compile time suited value for the burst length itself. We also report measures quantifying these phenomena for the case of a PC cluster. Actually, the data indicate that communication functionalities do not suffer from the use of non-minimal burst lengths for checkpoint operations, thus pointing out how, if well tuned, CCL provides highly effective, CPU offloaded, checkpointing functionalities.
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/61387
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 7
social impact