I was able to build the (gds-nvidia-fs-2.17.0) on RHEL-9.2 (5.14.0-284.11.1.el9_2.x86_64) with nvidia-driver (525.89.02). The make
worked fine and insmod nvidia-fs.ko
didn’t throw any errors.
[192745.286125] nvidia_fs: Initializing nvfs driver module
[192745.286136] nvidia_fs: registered correctly with major number 510
But when writing a file via gdsio
utility to storage (VAST) which has an rpcrdma driver installed, the throughput speed wasn’t expected, and dmesg
shows
[Sat Sep 9 20:24:26 2023] nvidia-fs:write IO failed :-512
[Sat Sep 9 20:24:26 2023] nvidia-fs:write IO failed :-512
[Sat Sep 9 20:24:26 2023] nvidia-fs:write IO failed :-512
[Sat Sep 9 20:24:26 2023] nvidia-fs:write IO failed :-512
[Sat Sep 9 20:24:58 2023] nvidia-fs:write IO failed :-512
FWIW, the gdscheck.py utility reports NFS is supported
./gdscheck.py -p
GDS release version: 1.7.2.10
nvidia_fs version: 2.17 libcufile version: 2.12
Platform: x86_64
NFS : Supported
Userspace RDMA : Unsupported
--Mellanox PeerDirect : Enabled
--rdma library : Not Loaded (libcufile_rdma.so)
--rdma devices : Not configured
--rdma_device_status : Up: 0 Down: 0
I am unsure how to debug this. Any leads would be really appreciated. Thank you!