In this paper we present the status of the 3rd generation design of the APEnet board (V5) built upon the 28nm Altera Stratix V FPGA; it features a PCIe Gen3 x8 interface and enhanced embedded transceivers with a maximum capability of 12.5Gbps each. The network architecture is designed in accordance to the Remote DMA paradigm. The APEnet+ V5 prototype is built upon the Stratix V DevKit with the addition of a proprietary, third party IP core implementing multi-DMA engines. Support for zero-copy communication is assured by the possibility of DMA-accessing either host and GPU memory, offloading the CPU from the chore of data copying. The current implementation plateaus to a bandwidth for memory read of 4.8GB/s. Here we describe the hardware optimization to the memory write process which relies on the use of two independent DMA engines and an improved TLB.
Latest generation interconnect technologies in APEnet+ networking infrastructure / Ammendola, R.; Biagioni, A.; Cretaro, P.; Frezza, O.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Paolucci, P. S.; Pastorelli, E.; Rossetti, D.; Simula, F.; Vicini, P.. - In: JOURNAL OF PHYSICS. CONFERENCE SERIES. - ISSN 1742-6588. - 898:8(2017). (Intervento presentato al convegno 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016 tenutosi a San Francisco; USA) [10.1088/1742-6596/898/8/082035].
Latest generation interconnect technologies in APEnet+ networking infrastructure
Cretaro P.;Lonardo A.;Pastorelli E.;
2017
Abstract
In this paper we present the status of the 3rd generation design of the APEnet board (V5) built upon the 28nm Altera Stratix V FPGA; it features a PCIe Gen3 x8 interface and enhanced embedded transceivers with a maximum capability of 12.5Gbps each. The network architecture is designed in accordance to the Remote DMA paradigm. The APEnet+ V5 prototype is built upon the Stratix V DevKit with the addition of a proprietary, third party IP core implementing multi-DMA engines. Support for zero-copy communication is assured by the possibility of DMA-accessing either host and GPU memory, offloading the CPU from the chore of data copying. The current implementation plateaus to a bandwidth for memory read of 4.8GB/s. Here we describe the hardware optimization to the memory write process which relies on the use of two independent DMA engines and an improved TLB.File | Dimensione | Formato | |
---|---|---|---|
Ammendola_Latest-generation_2017.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
797.36 kB
Formato
Adobe PDF
|
797.36 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.