I want to use the GPU Direct storage feature. But when I am running sample cufile_sample_001 (MagnumIO/gds/samples at main · NVIDIA/MagnumIO · GitHub) with this command (sudo ./cufile_sample_001 /mnt/nvme/test.txt CUDA:0), I am getting following error
08-03-2024 01:29:42:0 [pid=21748 tid=21748] ERROR 0:501 nvidia-fs MAP ioctl failed : ioctl_return: -22 ioctl_ret: -1
08-03-2024 01:29:42:0 [pid=21748 tid=21748] ERROR 0:515 map failed
08-03-2024 01:29:42:0 [pid=21748 tid=21748] ERROR cufio-obj:129 error allocating nvfs handle, size: 131072
08-03-2024 01:29:42:0 [pid=21748 tid=21748] ERROR cufio_core:1589 cuFileBufRegister error, object allocation failed
08-03-2024 01:29:42:0 [pid=21748 tid=21748] ERROR cufio_core:1667 cuFileBufRegister error cufile success
08-03-2024 01:29:42:1 [pid=21748 tid=21748] ERROR 0:501 nvidia-fs MAP ioctl failed : ioctl_return: -22 ioctl_ret: -1
08-03-2024 01:29:42:1 [pid=21748 tid=21748] ERROR 0:515 map failed
08-03-2024 01:29:42:1 [pid=21748 tid=21748] ERROR 0:829 Buffer map failed for PCI-Group: 0 GPU: 0
08-03-2024 01:29:42:1 [pid=21748 tid=21748] ERROR 0:957 Failed to obtain bounce buffer from domain: 0 GPU: 0
08-03-2024 01:29:42:1 [pid=21748 tid=21748] ERROR 0:1234 failed to get bounce buffer for PCI group 0 GPU 0
08-03-2024 01:29:42:1 [pid=21748 tid=21748] ERROR cufio:145 cuFileBufDeregister error, object for device pointer is not registered
08-03-2024 01:29:42:1 [pid=21748 tid=21748] ERROR cufio:171 cuFileBufDeregister error: device pointer lookup failure
ofed_info -s
MLNX_OFED_LINUX-5.8-4.1.5.0:
#python3 /usr/local/cuda/gds/tools/gdscheck.py -p
GDS release version: 1.8.0.34
nvidia_fs version: 2.17 libcufile version: 2.12
Platform: x86_64
ENVIRONMENT:
=====================
DRIVER CONFIGURATION:
NVMe : Supported
NVMeOF : Unsupported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
BeeGFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
–Mellanox PeerDirect : Disabled
–rdma library : Not Loaded (libcufile_rdma.so)
–rdma devices : Not configured
–rdma_device_status : Up: 0 Down: 0
CUFILE CONFIGURATION:
properties.use_compat_mode : true
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
profile.nvtx : false
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false
execution.max_io_threads : 4
execution.max_io_queue_depth : 128
execution.parallel_io : true
execution.min_io_threshold_size_kb : 8192
execution.max_request_parallelism : 4
properties.force_odirect_mode : false
properties.prefer_iouring : false
GPU INFO:
GPU index 0 NVIDIA GeForce RTX 3070 bar:1 bar size (MiB):256 supports GDS, IOMMU State: Disabled
PLATFORM INFO:
IOMMU: disabled
Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
Cuda Driver Version Installed: 12030
Platform: SYS-7049GP-TRT, Arch: x86_64(Linux 5.15.0-100-generic)
Platform verification succeeded
Cuda Toolkit 12.3
nvidia-fs driver version 2.17.5
GPU: NVIDIA GeForce RTX 3070
For GeForce Support, I gave following command according to cuda toolkit installation provided by nvidia
echo “options nvidia NVreg_OpenRmEnableUnsupportedGpus=1” | sudo tee /etc/modprobe.d/nvidia-gsp.conf
mount | grep ext4 | grep nvme
/dev/nvme1n1 on /mnt/nvme type ext4 (rw,relatime,data=ordered)
I am stuck with this problem for a few days but could not solve. Kindly help me. Let me know if you need other informations.