[BF2/BF3] Can the DPU Arm Directly Access GPU Memory on the Same Host?

Hello everyone,

I am currently developing on the BlueField-2/3 DPU platform and am trying to implement a specific function: to have the DPU’s Arm cores directly initiate read/write access to the memory of a local GPU installed on the same host, with the data path completely bypassing the host CPU.

My goal is to use the DPU Arm as a co-processor that can directly operate on data in GPU memory, aiming to achieve the lowest possible communication latency.

I have already reviewed the following documentation:

  • The NVIDIA DOCA DMA and GPUNetIO library.
  • Technical documents related to GPUDirect.

I haven’t been able to find a clear programming example (sample) or an authoritative guide on how to initiate direct access to local GPU memory from the DPU’s Arm cores.

Additionally, I came across the paper “Conspirator: SmartNIC-Aided Control Plane for Distributed ML Workloads” (https://www.usenix.org/system/files/atc24-xiao.pdf), which mentions a “SNIC DMA to GPU” technical path. This makes me more confident that direct communication between the DPU and a local GPU over the PCIe bus is theoretically possible.

Therefore, I would like to ask the community and official experts:

  1. On the BlueField-2/3 platform, is it possible for the DPU Arm cores to perform DMA operations directly on the memory of a GPU on the same host?
  2. If so, is there any official sample code, tutorial, or detailed documentation that could guide this implementation?
  3. What key DOCA libraries, APIs, or driver configurations are required to achieve this?

Hi,

Following to my check with the relevant engineers, accessing the GPU memory with DPU ARM cores is possible, but no sample app to demonstrate it at the moment.
It is not yet described in Nvidia official documentation.

Best Regards,
Anatoly

1 Like

Thank you for your response.

I would appreciate it if you could provide a concise example demonstrating the APIs to be used. Furthermore, I am interested to know if there are any specific requirements regarding the physical placement of the GPU and BlueField, such as the necessity for them to be under a single PCIe bridge, as is the case with GPUNetIO.