GPUDirect available on Ubuntu 18.04?

user4391 · October 19, 2021, 8:51am

In GDS download page, it seems like ubuntu 18.04 is available.

GDS Documentation says only ubuntu 20.04 is supported.
Actually, i didn’t completely understood. If i want to use GDS in Ubuntu, can I follow “On DGX OS” in NVIDIA GPUDirect Storage Installation and Troubleshooting Guide :: NVIDIA GPUDirect Storage Documentation this documentation?

And Documentation says MLNX_OFED version greater than 5.3 is required to use NVME direct. But this site tells that 18.04 only supports 4.15.
https://docs.mellanox.com/display/OFED510660/General+Support+in+MLNX_OFED

So, can i use GDS with Ubuntu 18.04? or do i have to upgrade 20.04?
And cuda 11 and rtx 3090 are okay to use with NVME?

rs277 · October 19, 2021, 7:05pm

The GPUDirect Release Notes state, “The RTX series of GPUs supports only compatibility mode.”

user4391 · October 19, 2021, 7:10pm

oh i got it. anyway, what does compatibility mode differentiate compared with GDS mode?

njuffa · October 19, 2021, 7:24pm

From the documentation:

To learn more about Compatibility Mode, refer to cuFile Compatibility Mode

The above links to:

https://docs.nvidia.com/gpudirect-storage/api-reference-guide/topics/cufile-compatibility.html

The cuFile library provides a mechanism for cuFile reads and writes to use compatibility mode using POSIX pread and pwrite APIS respectively to system memory and copying to GPU memory. The behavior of compatibility mode with cuFile APIs is determined by the following configuration parameters.

This tells me that there are no direct DMA transfers in compatibility mode.

user4391 · October 20, 2021, 2:11am

Thanks, I truely understood.
Is there no way to use GPUDirect with rtx 3090 now?

njuffa · October 20, 2021, 2:37am

Yes, you can use GDS with RTX 3090 in compatibility mode.

My (possibly incorrect) personal take is that GPUDirect with RDMA is a feature needed in, and supplied for, high-end professional HPC systems. Thus the initial focus on Tesla-based systems and the limitation to certain Linux environments, because that is what one finds in supercomputer systems.

The function of the compatibility mode seems to be that it allows software prototyping on developer machines using less ambitious hardware configurations; i.e. it is not something one would necessarily want to deploy in production systems.

user4391 · October 20, 2021, 2:44am

Is there no other method to transfer data between NVME and GPU not using GDS? really thanks for answers

njuffa · October 20, 2021, 2:58am

If you have any storage technology that can get data into system memory (this would typically use some sort of file system), you can then copy that data from system memory to the GPU.

Depending on the performance of the host system, the impact on throughput could be minor, because the primary limiting factor would be the throughput of the mass storage device (say, 7GB/sec on a PCIe x4 connection), followed by the throughput of the PCIe connection between host system and GPU (say, 22 GB/sec in the case of PCIe gen 4 x16). Compare that to the system memory bandwidth of about 80 GB/sec for a reasonable host system with four DDR4 channels and it is clear that the host system is the least restrictive component.

The main advantage of storage attached via GPUdirect is a reduction in latency, a secondary advantage is that it lowers the CPU load on the host system (which may not be a huge advantage given today’s 32-core CPUs).