CCL (Checkpointing and Communication Library) is a recently developed software in support of optimistic parallel simulation on myrinet based clusters. Beyond classical low latency message delivery functionalities, this library implements CPU offloaded, semi-asynchronous checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. The latest version of CCL (v2.4), designed for M2M-PCI32C myrinet cards, only supports monoprogrammed semi-asynchronous checkpoints. This forces re-synchronization between CPU and DMA activities each time a new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the DMA engine. In this paper we present CCL v3.0 that, exploiting hardware features of more advanced M3M-PCI64C myrinet cards, supports multiprogrammed semi-asynchronous checkpoints. The multiprogrammed approach allows higher degree of concurrency between checkpointing and other simulation specific operations carried out by the CPU, with obvious benefits on performance. We also report the results of the evaluation of those benefits for the case of a personal communication system simulation application.
CCL v3.0: Multiprogrammed semi-asynchronous checkpoints / Quaglia, Francesco; A., Santoro. - (2003), pp. 21-28. (Intervento presentato al convegno 17th Workshop on Parallel and Distributed Simulation tenutosi a SAN DIEGO, CA nel JUN 10-13, 2003) [10.1109/pads.2003.1207417].
CCL v3.0: Multiprogrammed semi-asynchronous checkpoints
QUAGLIA, Francesco;
2003
Abstract
CCL (Checkpointing and Communication Library) is a recently developed software in support of optimistic parallel simulation on myrinet based clusters. Beyond classical low latency message delivery functionalities, this library implements CPU offloaded, semi-asynchronous checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. The latest version of CCL (v2.4), designed for M2M-PCI32C myrinet cards, only supports monoprogrammed semi-asynchronous checkpoints. This forces re-synchronization between CPU and DMA activities each time a new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the DMA engine. In this paper we present CCL v3.0 that, exploiting hardware features of more advanced M3M-PCI64C myrinet cards, supports multiprogrammed semi-asynchronous checkpoints. The multiprogrammed approach allows higher degree of concurrency between checkpointing and other simulation specific operations carried out by the CPU, with obvious benefits on performance. We also report the results of the evaluation of those benefits for the case of a personal communication system simulation application.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.