DGX Spark GPUDirect RDMA

DGX Spark SoC is characterized by a unified memory architecture.

For performance reasons, specifically for CUDA contexts associated to the iGPU, the system memory returned by the pinned device memory allocators (e.g. cudaMalloc) cannot be coherently accessed by the CPU complex nor by I/O peripherals like PCI Express devices.

Hence the GPUDirect RDMA technology is not supported, and the mechanisms for direct I/O based on that technology, for example nvidia-peermem (for DOCA-Host), dma-buf or GDRCopy, do not work.

A compliant application should programmatically introspect the relevant platform capabilities, e.g. by querying CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_SUPPORTED (related to nv-p2p kernel APIs) or CU_DEVICE_ATTRIBUTE_DMA_BUF_SUPPORT (related to dma-buf), and leverage an appropriate fallback.

For example, for Linux RDMA applications based on the ib verbs library, we suggest to allocate the communication buffers with the cudaHostAlloc API and to register them with the ib_reg_mr function.