In this paper we present the status of the 3rd generation design of the APEnet board (V5) built upon the 28nm Altera Stratix V FPGA; it features a PCIe Gen3 x8 interface and enhanced embedded transceivers with a maximum capability of 12.5Gbps each. The network architecture is designed in accordance to the Remote DMA paradigm. The APEnet+ V5 prototype is built upon the Stratix V DevKit with the addition of a proprietary, third party IP core implementing multi-DMA engines. Support for zero-copy communication is assured by the possibility of DMA-accessing either host and GPU memory, offloading the CPU from the chore of data copying. The current implementation plateaus to a bandwidth for memory read of 4.8GB/s. Here we describe the hardware optimization to the memory write process which relies on the use of two independent DMA engines and an improved TLB.

Latest generation interconnect technologies in APEnet+ networking infrastructure / Ammendola, R.; Biagioni, A.; Cretaro, P.; Frezza, O.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Paolucci, P. S.; Pastorelli, E.; Rossetti, D.; Simula, F.; Vicini, P.. - In: JOURNAL OF PHYSICS. CONFERENCE SERIES. - ISSN 1742-6588. - 898:8(2017). (Intervento presentato al convegno 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016 tenutosi a San Francisco; USA) [10.1088/1742-6596/898/8/082035].

Latest generation interconnect technologies in APEnet+ networking infrastructure

Cretaro P.;Lonardo A.;Pastorelli E.;
2017

Abstract

In this paper we present the status of the 3rd generation design of the APEnet board (V5) built upon the 28nm Altera Stratix V FPGA; it features a PCIe Gen3 x8 interface and enhanced embedded transceivers with a maximum capability of 12.5Gbps each. The network architecture is designed in accordance to the Remote DMA paradigm. The APEnet+ V5 prototype is built upon the Stratix V DevKit with the addition of a proprietary, third party IP core implementing multi-DMA engines. Support for zero-copy communication is assured by the possibility of DMA-accessing either host and GPU memory, offloading the CPU from the chore of data copying. The current implementation plateaus to a bandwidth for memory read of 4.8GB/s. Here we describe the hardware optimization to the memory write process which relies on the use of two independent DMA engines and an improved TLB.
2017
22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016
high performance computing; interconnect; network; GPU
04 Pubblicazione in atti di convegno::04c Atto di convegno in rivista
Latest generation interconnect technologies in APEnet+ networking infrastructure / Ammendola, R.; Biagioni, A.; Cretaro, P.; Frezza, O.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Paolucci, P. S.; Pastorelli, E.; Rossetti, D.; Simula, F.; Vicini, P.. - In: JOURNAL OF PHYSICS. CONFERENCE SERIES. - ISSN 1742-6588. - 898:8(2017). (Intervento presentato al convegno 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016 tenutosi a San Francisco; USA) [10.1088/1742-6596/898/8/082035].
File allegati a questo prodotto
File Dimensione Formato  
Ammendola_Latest-generation_2017.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 797.36 kB
Formato Adobe PDF
797.36 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1340366
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact