Nvidia_p2p_get_pages() failing with error code -22

I am implementing NVIDIA GDS with the following hardware config:

Ubuntu 22.04
CUDA 12.1
Nvidia Drivers 530.30.2
MLNX driver - 5.8.0
NVIDA GeForce RTX 3090
Samsung 980 DC NVMe drive.
IOMMU is disabled
PCIe bar has been resized to that of VRAM size

GDS was installed and verified successfully

 GDS release version: 1.6.1.9
 nvidia_fs version:  2.15 libcufile version: 2.12
 Platform: x86_64
 ============
 ENVIRONMENT:
 ============
 =====================
 DRIVER CONFIGURATION:
 =====================
 NVMe               : Supported
 NVMeOF             : Unsupported
 SCSI               : Unsupported
 ScaleFlux CSD      : Unsupported
 NVMesh             : Unsupported
 DDN EXAScaler      : Unsupported
 IBM Spectrum Scale : Unsupported
 NFS                : Unsupported
 BeeGFS             : Unsupported
 WekaFS             : Unsupported
 Userspace RDMA     : Unsupported
 --Mellanox PeerDirect : Disabled
 --rdma library        : Not Loaded (libcufile_rdma.so)
 --rdma devices        : Not configured
 --rdma_device_status  : Up: 0 Down: 0
 =====================
 CUFILE CONFIGURATION:
 =====================
 properties.use_compat_mode : true
 properties.force_compat_mode : false
 properties.gds_rdma_write_support : true
 properties.use_poll_mode : false
 properties.poll_mode_max_size_kb : 4
 properties.max_batch_io_size : 128
 properties.max_batch_io_timeout_msecs : 5
 properties.max_direct_io_size_kb : 1024
 properties.max_device_cache_size_kb : 131072
 properties.max_device_pinned_mem_size_kb : 18014398509481980
 properties.posix_pool_slab_size_kb : 4 1024 16384 
 properties.posix_pool_slab_count : 128 64 32 
 properties.rdma_peer_affinity_policy : RoundRobin
 properties.rdma_dynamic_routing : 0
 fs.generic.posix_unaligned_writes : false
 fs.lustre.posix_gds_min_kb: 0
 fs.beegfs.posix_gds_min_kb: 0
 fs.weka.rdma_write_support: false
 fs.gpfs.gds_write_support: false
 profile.nvtx : false
 profile.cufile_stats : 0
 miscellaneous.api_check_aggressive : false
 execution.max_io_threads : 0
 execution.max_io_queue_depth : 128
 execution.parallel_io : false
 execution.min_io_threshold_size_kb : 1024
 execution.max_request_parallelism : 0
 =========
 GPU INFO:
 =========
 GPU index 0 NVIDIA GeForce RTX 3090 bar:1 bar size (MiB):32768, IOMMU State: Disabled
 ==============
 PLATFORM INFO:
 ==============
 Found ACS enabled for switch 0000:00:02.1
 IOMMU: disabled
 Platform verification succeeded

But when I was running their test benchmarks it was failing as below:

./gdsio_verify -f /media/nvme/write-test -d 0 -n 1 -s 1G 

warn: error opening log file: Permission denied, logging will be disabled
gpu index :0,file :/media/nvme/write-test, gpu buffer alignment :0, gpu buffer offset :0, gpu devptr offset :0, file offset :0, io_requested :1073741824, io_chunk_size :1073741824, bufregister :true, sync :1, nr ios :1, 
fsync :0, 
Batch mode: 0
cuFileRead returned error(ret=-1, step_size=1073741824, bytes_left=1073741824)
buffer deregister failed :device pointer lookup failure

Checking dmesg logs found:

nvidia-fs:nvfs_pin_gpu_pages:1292 Error ret -22 invoking nvidia_p2p_get_pages
                va_start=0x7f6792900000/va_end=0x7f67929fffff/rounded_size=0x100000/gpu_buf_length=0x100000

Digging up some articles I found that GPU Direct RDMA is supported only for Tesla/Quadro class GPU’s. I am curious to know what’s preventing RTX 3090 to support this, is it something on the hardware that’s missing or some driver module?

Hey @utkrishtp,

Yes and Yes. There are significant differences in both the HW design of Server GPUs as well as the necessary SW. And as far as I am aware the underlying system architecture of the Server running one of the HPC GPUs needs to be compatible to use GPUDirect Storage as well.

I am sorry, but right now your RTX 3090 in a normal desktop setup is not officially supported.