Why is my NVMe storage unsupported in GDS? (Ubuntu 22.04 + Tesla V100)

Hello everyone,

I encountered an issue while using GPUDirect Storage (GDS). The system is showing that NVMe storage is unsupported.

NVMe               : Unsupported

My environment and configuration are as follows:

Operating System: Ubuntu 22.04 (kernel 6.8.0-87-generic)
GPU: NVIDIA Tesla V100 (32GB)
CUDA Version: 12.4
NVIDIA Driver Version: 550.54.14
nvidia-fs Version: 2.17.4
MLNX_OFED Version: MLNX_OFED_LINUX-24.10-3.2.5.0
Kernel Version: 5.15.0-97-generic (switched to 5.15 kernel)
IOMMU: Disabled
PCIe Topology: GPU and NVMe storage devices are on the same PCIe root complex.
NVMe Model: Intel Optane SSD 900P

Despite my system meeting the GDS requirements and the NVMe storage device being properly connected, running the gdscheck -p command shows NVMe unsupported. The exact output is as follows:

$ ./gdscheck.py -p
GDS release version: 1.9.0.20
nvidia_fs version: 2.17
libcufile version: 2.12
Platform: x86_64

=========== ENVIRONMENT: ===========

===================== DRIVER CONFIGURATION: =====================
NVMe               : Unsupported
NVMeOF             : Unsupported
SCSI               : Unsupported
ScaleFlux CSD      : Unsupported
NVMesh             : Unsupported
DDN EXAScaler      : Unsupported
IBM Spectrum Scale : Unsupported
NFS                : Unsupported
BeeGFS             : Unsupported
WekaFS             : Unsupported
Userspace RDMA     : Unsupported
--Mellanox PeerDirect : Disabled
--rdma library        : Not Loaded (libcufile_rdma.so)
--rdma devices        : Not configured
--rdma_device_status  : Up: 0 Down: 0

==================== CUFILE CONFIGURATION: =====================
properties.use_compat_mode                : true
properties.force_compat_mode              : false
properties.gds_rdma_write_support        : true
properties.use_poll_mode                 : false
properties.poll_mode_max_size_kb         : 4
properties.max_batch_io_size             : 128
properties.max_batch_io_timeout_msecs   : 5
properties.max_direct_io_size_kb         : 1024
properties.max_device_cache_size_kb      : 131072
properties.max_device_pinned_mem_size_kb : 18014398509481980
properties.posix_pool_slab_size_kb      : 4 1024 16384
properties.posix_pool_slab_count        : 128 64 32
properties.rdma_peer_affinity_policy    : RoundRobin
properties.rdma_dynamic_routing         : 0
fs.generic.posix_unaligned_writes       : false
fs.lustre.posix_gds_min_kb              : 0
fs.beegfs.posix_gds_min_kb             : 0
fs.weka.rdma_write_support              : false
fs.gpfs.gds_write_support              : false
profile.nvtx                            : false
profile.cufile_stats                   : 0
miscellaneous.api_check_aggressive     : false
execution.max_io_threads               : 0
execution.max_io_queue_depth           : 128
execution.parallel_io                  : false
execution.min_io_threshold_size_kb     : 1024
execution.max_request_parallelism      : 0
properties.force_odirect_mode          : false
properties.prefer_iouring              : false

=========== GPU INFO: ============
GPU index 0 Tesla V100-PCIE-32GB bar:1 bar size (MiB):32768 supports GDS, IOMMU State: Disabled
GPU index 1 Tesla P100-PCIE-16GB bar:1 bar size (MiB):16384 supports GDS, IOMMU State: Disabled

============= PLATFORM INFO: ==============
IOMMU: disabled
Nvidia Driver Info Status: Supported only on (nvidia-fs version <= 2.17.4)
Cuda Driver Version Installed: 12040
Platform: SYS-2028GR-TR, Arch: x86_64 (Linux 5.15.0-97-generic)
Platform verification succeeded


When performing a GPU to NVMe transfer test using gdsio, the CPU utilization is very high, nearly identical to the GPU to CPU to NVMe test, which seems a bit odd.

Why does the NVMe field show as Unsupported in the gdscheck command output?Is there any specific driver, configuration, or patch missing that prevents the NVMe storage device from supporting GDS?Are there any specific hardware configurations or version requirements to enable NVMe support for GDS?