The cuFile library provides a mechanism for cuFile reads and writes to use compatibility mode using POSIX pread and pwrite APIS respectively to system memory and copying to GPU memory. The behavior of compatibility mode with cuFile APIs is determined by the following configuration parameters.
This tells me that there are no direct DMA transfers in compatibility mode.
Yes, you can use GDS with RTX 3090 in compatibility mode.
My (possibly incorrect) personal take is that GPUDirect with RDMA is a feature needed in, and supplied for, high-end professional HPC systems. Thus the initial focus on Tesla-based systems and the limitation to certain Linux environments, because that is what one finds in supercomputer systems.
The function of the compatibility mode seems to be that it allows software prototyping on developer machines using less ambitious hardware configurations; i.e. it is not something one would necessarily want to deploy in production systems.
If you have any storage technology that can get data into system memory (this would typically use some sort of file system), you can then copy that data from system memory to the GPU.
Depending on the performance of the host system, the impact on throughput could be minor, because the primary limiting factor would be the throughput of the mass storage device (say, 7GB/sec on a PCIe x4 connection), followed by the throughput of the PCIe connection between host system and GPU (say, 22 GB/sec in the case of PCIe gen 4 x16). Compare that to the system memory bandwidth of about 80 GB/sec for a reasonable host system with four DDR4 channels and it is clear that the host system is the least restrictive component.
The main advantage of storage attached via GPUdirect is a reduction in latency, a secondary advantage is that it lowers the CPU load on the host system (which may not be a huge advantage given today’s 32-core CPUs).