Hi there,
I’m looking for assistance with setting up GPUDirect Storage (GDS) to access a remote NVMe storage device via RDMA. I have a remote NVMe-over-RDMA block device (/dev/nvme4n1), and I’m trying to use gdsio to perform read/write operations to it.
Hardware Setup
- CPU: INTEL(R) XEON(R) GOLD 6526Y, 64 cores
- GPU: NVIDIA A100-SXM4-40GB
- SSD: SAMSUNG MZQL21T9HCJR-00A07, (remote)
Software Setup
- Ubuntu 22.04, Linux kernel 5.15.0
- MLNX_OFED: MLNX_OFED_LINUX-24.10-1.1.4.0-ubuntu22.04-x86_64
- cuda 12.6
- GDS release version: 1.11.1.6
- nvidia_fs version: 2.22
- libcufile version: 2.12
- Platform: x86_64
- Nvidia: 560.35.05
- iommu is disabled.
Output of gdscheck:
$ sudo gdscheck -p
GDS release version: 1.11.1.6
nvidia_fs version: 2.22 libcufile version: 2.12
Platform: x86_64
============
ENVIRONMENT:
============
=====================
DRIVER CONFIGURATION:
=====================
NVMe : Supported
NVMeOF : Supported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
BeeGFS : Unsupported
WekaFS : Supported
Userspace RDMA : Supported
--Mellanox PeerDirect : Enabled
--rdma library : Loaded (libcufile_rdma.so)
--rdma devices : Configured
--rdma_device_status : Up: 1 Down: 0
=====================
CUFILE CONFIGURATION:
=====================
properties.use_compat_mode : true
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
profile.nvtx : false
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false
execution.max_io_threads : 4
execution.max_io_queue_depth : 128
execution.parallel_io : true
execution.min_io_threshold_size_kb : 8192
execution.max_request_parallelism : 4
properties.force_odirect_mode : false
properties.prefer_iouring : false
=========
GPU INFO:
=========
GPU index 0 NVIDIA A100-SXM4-40GB bar:1 bar size (MiB):65536 supports GDS, IOMMU State: Disabled
==============
PLATFORM INFO:
==============
IOMMU: disabled
Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
Cuda Driver Version Installed: 12060
Platform: R283-S93-AAF1-000, Arch: x86_64(Linux 5.15.134+release+2.10.0r8-amd64)
Platform verification succeeded
Mount Setup
$ df -Th | grep nvme4n1
/dev/nvme4n1 ext4 1.8T 153G 1.5T 10% /mnt/remote
$ findmnt -o TARGET,FSTYPE,OPTIONS,SOURCE /mnt/remote
TARGET FSTYPE OPTIONS SOURCE
/mnt/remote ext4 rw,relatime,stripe=32,data=ordered /dev/nvme4n1
Output of stat
$ stat /mnt/remote/
File: /mnt/remote/
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: 10305h/66309d Inode: 2 Links: 3
Access: (0755/drwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1000/ user)
Access: 2025-05-08 20:12:36.544308291 +0000
Modify: 2025-05-08 18:46:39.982090301 +0000
Change: 2025-05-08 18:46:39.982090301 +0000
Birth: 2025-05-08 17:49:35.000000000 +0000
Topology
$ nvidia-smi topo -m
GPU0 NIC0 NIC1 NIC2 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS SYS NODE 16-31,48-63 1 N/A
NIC0 SYS X PIX SYS
NIC1 SYS PIX X SYS
NIC2 NODE SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
I used mlx5_2 to establish the RDMA connection between the local and remote nodes.
Ovservations
When I run the following command:
$ sudo gdsio -f /mnt/remote/32GFile -d 0 -i 4K -w 1 -x 0 -I 2 -T 10 -k 42
file register error: internal error filename :/mnt/remote/32GFile
This issue does not happen when using a local NVMe SSD (e.g., /dev/nvme0n1) mounted in the same way. It only fails on the NVMe-over-RDMA device.
And there are some error messages in cuFile.log
08-05-2025 20:38:46:605 [pid=610313 tid=610313] ERROR cufio-udev:67 udev property not found: ID_FS_USAGE nvme4n1
08-05-2025 20:38:46:605 [pid=610313 tid=610313] ERROR cufio-fs:742 error getting volume attributes error for device: dev_no: 259:5
08-05-2025 20:38:46:605 [pid=610313 tid=610313] NOTICE cufio:293 cuFileHandleRegister GDS not supported or disabled by config, using cuFile posix read/write with compat mode enabled
08-05-2025 20:38:46:605 [pid=610313 tid=610313] ERROR cufio-udev:67 udev property not found: ID_FS_USAGE nvme4n1
08-05-2025 20:38:46:605 [pid=610313 tid=610313] ERROR cufio-fs:742 error getting volume attributes error for device: dev_no: 259:5
08-05-2025 20:38:46:605 [pid=610313 tid=610313] ERROR cufio-obj:215 unable to get volume attributes for fd 56
08-05-2025 20:38:46:605 [pid=610313 tid=610313] ERROR cufio:311 cuFileHandleRegister error, failed to allocate file object
08-05-2025 20:38:46:605 [pid=610313 tid=610313] ERROR cufio:339 cuFileHandleRegister error: internal error
I believe GPUDirect Storage should support NVMe-oF, but I’m not sure how to resolve this issue. I haven’t modified the cuFile.json configuration—it’s currently using all default settings.