cuFileHandleRegister returned an 'internal error' error when using GPUDirect Storage technology on BeeGFS

I have configured GDS and BeeGFS according to the official website of NVIDIA. The verification script prompts that BeeGFS is supported, but when I write files to the directory mounted on BeeGFS, cuFileHandleRegister returns error code 5003, which means “internal error”

I successfully wrote on the NVME device using the same method. Here is my environment information and operation process.

Can anyone help me? Thank you very much!
@sougupta

[root@orcafs19141 samples]# df -hT
Filesystem                 Type      Size  Used Avail Use% Mounted on
devtmpfs                   devtmpfs   63G     0   63G   0% /dev
tmpfs                      tmpfs      63G     0   63G   0% /dev/shm
tmpfs                      tmpfs      63G   34M   63G   1% /run
tmpfs                      tmpfs      63G     0   63G   0% /sys/fs/cgroup
/dev/mapper/cl_orcafs-root xfs        70G   39G   32G  56% /
/dev/sda1                  xfs      1014M  268M  747M  27% /boot
tmpfs                      tmpfs      13G     0   13G   0% /run/user/0
/dev/nvme0n1               ext4      916G  140M  870G   1% /mnt/nvme
orcafs_nodev               beegfs    2.8T   20G  2.8T   1% /mnt/orcafs

[root@orcafs19141 samples]# /usr/local/cuda-12.1/gds/tools/gdscheck.py -p
 GDS release version: 1.6.1.9
 nvidia_fs version:  2.15 libcufile version: 2.12
 Platform: x86_64
 ============
 ENVIRONMENT:
 ============
 CUFILE_ENV_PATH_JSON : /root/workspace/GDS/cufile.json
 =====================
 DRIVER CONFIGURATION:
 =====================
 NVMe               : Supported
 NVMeOF             : Supported
 SCSI               : Unsupported
 ScaleFlux CSD      : Unsupported
 NVMesh             : Unsupported
 DDN EXAScaler      : Unsupported
 IBM Spectrum Scale : Unsupported
 NFS                : Unsupported
 BeeGFS             : Supported
 WekaFS             : Unsupported
 Userspace RDMA     : Unsupported
 --Mellanox PeerDirect : Disabled
 --rdma library        : Not Loaded (libcufile_rdma.so)
 --rdma devices        : Not configured
 --rdma_device_status  : Up: 0 Down: 0
 =====================
 CUFILE CONFIGURATION:
 =====================
 properties.use_compat_mode : true
 properties.force_compat_mode : false
 properties.gds_rdma_write_support : true
 properties.use_poll_mode : false
 properties.poll_mode_max_size_kb : 4
 properties.max_batch_io_size : 128
 properties.max_batch_io_timeout_msecs : 5
 properties.max_direct_io_size_kb : 16384
 properties.max_device_cache_size_kb : 131072
 properties.max_device_pinned_mem_size_kb : 33554432
 properties.posix_pool_slab_size_kb : 4 1024 16384
 properties.posix_pool_slab_count : 128 64 32
 properties.rdma_peer_affinity_policy : RoundRobin
 properties.rdma_dynamic_routing : 0
 fs.generic.posix_unaligned_writes : false
 fs.lustre.posix_gds_min_kb: 0
 fs.beegfs.posix_gds_min_kb: 4
 fs.beegfs.rdma_dev_addr_list : 192.168.20.141 192.168.20.142
 fs.weka.rdma_write_support: false
 fs.gpfs.gds_write_support: false
 profile.nvtx : false
 profile.cufile_stats : 0
 miscellaneous.api_check_aggressive : false
 execution.max_io_threads : 0
 execution.max_io_queue_depth : 128
 execution.parallel_io : false
 execution.min_io_threshold_size_kb : 8192
 execution.max_request_parallelism : 0
 =========
 GPU INFO:
 =========
 GPU index 0 Tesla P4 bar:1 bar size (MiB):256 supports GDS, IOMMU State: Disabled
 ==============
 PLATFORM INFO:
 ==============
 IOMMU: disabled
 Platform verification succeeded


[root@orcafs19141 samples]# /usr/local/cuda-12.1/gds/samples/cufile_sample_001 /mnt/nvme/testGPUx 0
opening file /mnt/nvme/testGPUx
registering device memory of size :131072
writing from device memory
deregistering device memory

[root@orcafs19141 samples]# /usr/local/cuda-12.1/gds/samples/cufile_sample_001 /mnt/orcafs/data/testGPUx 0
opening file /mnt/orcafs/data/testGPUx
file register error:internal error
file register error code: 5030
cat cufile.log

 12-05-2023 10:50:16:462 [pid=339589 tid=339589] ERROR  cufio-fs:322 error creating udev_device for block device dev_no: 0:46
 12-05-2023 10:50:16:462 [pid=339589 tid=339589] ERROR  cufio-fs:742 error getting volume attributes error for device: dev_no: 0:46
 12-05-2023 10:50:16:462 [pid=339589 tid=339589] DEBUG  cufio:1137 cuFile DIO status for file descriptor 45 DirectIO not supported
 12-05-2023 10:50:16:462 [pid=339589 tid=339589] NOTICE  cufio:1546 cuFileHandleRegister GDS not supported or disabled by config, using cuFile posix read/write with compat mode enabled
 12-05-2023 10:50:16:463 [pid=339589 tid=339589] ERROR  cufio-fs:322 error creating udev_device for block device dev_no: 0:46
 12-05-2023 10:50:16:463 [pid=339589 tid=339589] ERROR  cufio-fs:742 error getting volume attributes error for device: dev_no: 0:46
 12-05-2023 10:50:16:463 [pid=339589 tid=339589] ERROR  cufio-obj:177 unable to get volume attributes for fd 45
 12-05-2023 10:50:16:463 [pid=339589 tid=339589] ERROR  cufio:1564 cuFileHandleRegister error, failed to allocate file object
 12-05-2023 10:50:16:463 [pid=339589 tid=339589] ERROR  cufio:1592 cuFileHandleRegister error: internal error

In fact, I was trying to add my own distributed file system to support GDS. I masqueraded my file system as BeeGFS. I have found the answer to this question, I need to change the magic of our own file system to be the same as BeeGFS in nvidia-fs. Now this problem has been fixed. The current situation is that our system can read and write files when GDS is enabled, but when the memory is released (calling cudaFree()), the CPU gets stuck, and then the system is abnormal, and it cannot be recovered unless it is restarted. The print is as follows, Can someone provide some help, thanks!

"watchdog: BUG: soft lockup - CPU#32 stuck for 22s! [cufile_sample_0:61865]"

The system call stack is as follows:

Wed May 17 20:34:57 2023] rcu: INFO: rcu_sched self-detected stall on CPU
[Wed May 17 20:34:57 2023] rcu: 35-....: (59929 ticks this GP) idle=3fe/1/0x4000000000000002 softirq=8880/8889 fqs=14990
[Wed May 17 20:34:57 2023] (t=60000 jiffies g=126349 q=14148)
[Wed May 17 20:34:57 2023] NMI backtrace for cpu 35
[Wed May 17 20:34:57 2023] CPU: 35 PID: 21428 Comm: cufile_sample_0 Kdump: loaded Tainted: P OEL --------- -t-4.18.0-240.el8.x86_64 #1
[Wed May 17 20:34:57 2023] Hardware name: Supermicro SSG-2028R-NR48N/X10DSC+, BIOS 3.0a 02/09/2018
[Wed May 17 20:34:57 2023] Call Trace:
[Wed May 17 20:34:57 2023] <IRQ>
[Wed May 17 20:34:57 2023] dump_stack+0x5c/0x80
[Wed May 17 20:34:57 2023] nmi_cpu_backtrace.cold.6+0x13/0x4e
[Wed May 17 20:34:57 2023] ?lapic_can_unplug_cpu.cold.28+0x37/0x37
[Wed May 17 20:34:57 2023]nmi_trigger_cpumask_backtrace+0xde/0xe0
[Wed May 17 20:34:57 2023] rcu_dump_cpu_stacks+0x9c/0xca
[Wed May 17 20:34:57 2023] rcu_sched_clock_irq.cold.70+0x1b4/0x3b8
[Wed May 17 20:34:57 2023] ? tick_sched_do_timer+0x60/0x60
[Wed May 17 20:34:57 2023] ? tick_sched_do_timer+0x60/0x60
[Wed May 17 20:34:57 2023] update_process_times+0x24/0x50
[Wed May 17 20:34:57 2023] tick_sched_handle+0x22/0x60
[Wed May 17 20:34:57 2023] tick_sched_timer+0x37/0x70
[Wed May 17 20:34:57 2023] __hrtimer_run_queues+0x100/0x280
[Wed May 17 20:34:57 2023] hrtimer_interrupt+0x100/0x220
[Wed May 17 20:34:57 2023] smp_apic_timer_interrupt+0x6a/0x130
[Wed May 17 20:34:57 2023]apic_timer_interrupt+0xf/0x20
[Wed May 17 20:34:57 2023] </IRQ>
[Wed May 17 20:34:57 2023] RIP: 0010:nvfs_get_pages_free_callback+0x106/0x1e0 [nvidia_fs]
[Wed May 17 20:34:57 2023] Code: 47 20 00 00 00 00 4c 89 ff e8 e6 6f 59 df 48 85 db 74 4c 49 89 df 49 83 ef 18 74 43 48 8b 7b e8 48 8b 0 3 48 85 ff 75 a5 0f 0b <8b> 45 60 83 f8 08 75 f8 bf e3 53 00 00 e8 e8 1e bb df 48 8b 44 24
[Wed May 17 20:34:57 2023] RSP: 0018:ffff980acbeb3b48 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
[Wed May 17 20:34:57 2023] RAX: 0000000000000005 RBX: ffff8bd77088b888 RCX: 0000000000000005
[Wed May 17 20:34:57 2023] RDX: 0000000000000006 RSI: 0000000000000001 RDI: ffff8bd7b22c3440
[Wed May 17 20:34:57 2023] RBP: ffff8bd7b22c3400 R08: 000000000000087a R09: 00000000000000007
[Wed May 17 20:34:57 2023] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8bd7b22c3440
[Wed May 17 20:34:57 2023] R13: ffff8bd7b24ffcc0 R14: ffff8bd77088b898 R15: ffff8bd74a0dad08
[Wed May 17 20:34:57 2023] ? nvfs_get_pages_free_callback+0x51/0x1e0 [nvidia_fs]
[Wed May 17 20:34:57 2023] ?os_acquire_spinlock+0xe/0x20 [nvidia]
[Wed May 17 20:34:57 2023] ?_nv040575rm+0x10/0x20 [nvidia]
[Wed May 17 20:34:57 2023] nv_p2p_mem_info_free_callback+0x15/0x30 [nvidia]
[Wed May 17 20:34:57 2023] _nv000082rm+0x59/0x130 [nvidia]
[Wed May 17 20:34:57 2023] ?_nv041401rm+0x1be/0x1d0 [nvidia]
[Wed May 17 20:34:57 2023] ?_nv043322rm+0x1f1/0x300 [nvidia]
[Wed May 17 20:34:57 2023] ?_nv012571rm+0x3dc/0x650 [nvidia]
[Wed May 17 20:34:57 2023] ?_nv041543rm+0x69/0xd0 [nvidia]
[Wed May 17 20:34:57 2023] ?_nv011145rm+0x86/0xa0 [nvidia]
[Wed May 17 20:34:57 2023] ?_nv000707rm+0x871/0xdb0 [nvidia]
[Wed May 17 20:34:57 2023] ? rm_ioctl+0x58/0xb0 [nvidia]
[Wed May 17 20:34:57 2023] ? nvidia_ioctl+0x1e7/0x7e0 [nvidia]
[Wed May 17 20:34:57 2023] ? nvidia_frontend_unlocked_ioctl+0x3a/0x50 [nvidia]
[Wed May 17 20:34:57 2023] ? do_vfs_ioctl+0xa4/0x640
[Wed May 17 20:34:57 2023] ? syscall_trace_enter+0x1d3/0x2c0
[Wed May 17 20:34:57 2023] ? ksys_ioctl+0x60/0x90
[Wed May 17 20:34:57 2023] ? __x64_sys_ioctl+0x16/0x20
[Wed May 17 20:34:57 2023] ? do_syscall_64+0x5b/0x1a0
[Wed May 17 20:34:57 2023] ? entry_SYSCALL_64_after_hwframe+0x65/0xca

You may want to try latest nvidia-fs on github.
It will be ideal to reach out Nvidia if you want to integrate with GDS.

Do you mean that if I want to integrate GDS into my file system, this is only related to nvidia-fs, but not to cuda. If so, I’ll just look into nvidia-fs. I’ve actually done some work, just not quite. The relevant website I researched is the following:
GitHub - NVIDIA/gds-nvidia-fs: NVIDIA GPUDirect Storage Driver.

I found a way to solve my problem. I enable the rdma_dynamic_routing option in /etc/cufile.json (set it to true), and then the program can read and write normally, and the phenomenon of stuck no longer occurs. But I don’t know why. My relevant version information is as follows, who can provide some guidance, thank you.

  GDS release version: 1.6.1.9
  nvidia_fs version: 2.15 libcufile version: 2.12
  Platform: x86_64

hi, Fanyuanli
How did you done these interagtion work? We are also trying the same things to support GDS in our in-house distribute filesytem.
We also mocked as an beegfs fs type, the cuFileHandleRegister and cuFileBufRegiser can be suceess now. But when we inovke cuFileWrite can be sucess, but it will use cuFile IO mode:POSIX. So, the IO is not GDS mode. I wonder how you fix this problem.

I would appreciate if you could kindly provide me some advise.
I am looking forward to you reply.
My email jimhuaang@gmail.com.

Thanks a lot!

Yes, I have fixed my issue. It’s because I missed something of GDS in BeeGFS. If you do it carefully, you can succeed.