GDSIO run nvmeof(rdma) have the error"Error: IO failed stopping traffic, fd :33 ret:-1 errno :5"

Hi all:
The nvme over rdma environment is ok , follow this "ESPCommunity "
when i run the gdsio test, have the error as follows:
root@gds:/usr/local/cuda-12.1/gds/tools# ./gdsio -D /test -d 0 -w 4 -s 1G -i 4k -x 0 -I 0 -T 60
Error: IO failed stopping traffic, fd :33 ret:-1 errno :5
io failed :ret :-1 errno :5, file offset :0, block size :4096
Error: IO failed stopping traffic, fd :34 ret:-1 errno :5
io failed :ret :-1 errno :5, file offset :0, block size :4096
Error: IO failed stopping traffic, fd :32 ret:-1 errno :5
io failed :ret :-1 errno :5, file offset :0, block size :4096
Error: IO failed stopping traffic, fd :31 ret:-1 errno :5
io failed :ret :-1 errno :5, file offset :0, block size :4096
root@gds:/usr/local/cuda-12.1/gds/tools#

when I run gdsio with -x 1 and -x 2 , there are ok.
root@gds:/usr/local/cuda-12.1/gds/tools# ./gdscheck -p
GDS release version: 1.6.0.25
nvidia_fs version: 2.15 libcufile version: 2.12
Platform: x86_64

ENVIRONMENT:

=====================
DRIVER CONFIGURATION:

NVMe : Supported
NVMeOF : Supported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
BeeGFS : Unsupported
WekaFS : Supported
Userspace RDMA : Supported
–Mellanox PeerDirect : Enabled
–rdma library : Loaded (libcufile_rdma.so)
–rdma devices : Configured
–rdma_device_status : Up: 1 Down: 0

who can help me ?
Thanks.

What is the filesystem you are using for the path /test.

GDS p2p mode is supported on ext4 mounted with data=ordered and XFS filesystem.

please share any errors in cufile.log or dmesg output related to the NVMe and nvidia-fs.ko.
Also what is the kernel version being used ?

the filesystem is ext4
root@gds:/home/dpu# nvme list
Node SN Model Namespace Usage Format FW Rev


/dev/nvme0n1 PHLF820500Y11P0GGN Dell Express Flash NVMe P4500 1.0TB SFF 1 1.00 TB / 1.00 TB 512 B + 0 B QDV1DP13
/dev/nvme1n1 4888be95bffebf7f9865 Linux 1 16.49 TB / 16.49 TB 512 B + 0 B 5.15.0-4
root@gds:/home/dpu# mount -o data=ordered /dev/nvme1n1 /test
root@gds:/home/dpu# mount | grep /test
/dev/nvme1n1 on /test type ext4 (rw,relatime,stripe=64,data=ordered)
root@gds:/home/dpu#

the kernel as follow:
root@gds:/home/dpu# uname -r
5.15.0-70-generic
root@gds:/home/dpu# cat /etc/os-release
PRETTY_NAME=“Ubuntu 22.04.1 LTS”
NAME=“Ubuntu”
VERSION_ID=“22.04”
VERSION=“22.04.1 LTS (Jammy Jellyfish)”
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL=“https://www.ubuntu.com/
SUPPORT_URL=“https://help.ubuntu.com/
BUG_REPORT_URL=“Bugs : Ubuntu
PRIVACY_POLICY_URL=“Data privacy | Ubuntu
UBUNTU_CODENAME=jammy
root@gds:/home/dpu#

dmesg output as follow:
[Tue Apr 25 06:49:31 2023] nvme nvme1: Failed to map data (-22)
[Tue Apr 25 06:49:31 2023] nvme nvme1: Failed to map data (-22)
[Tue Apr 25 06:49:31 2023] nvme nvme1: Failed to map data (-22)
[Tue Apr 25 06:49:31 2023] blk_update_request: I/O error, dev nvme1c1n1, sector 66846720 op 0x0:(READ) flags 0x2000000 phys_seg 1 prio class 0
[Tue Apr 25 06:49:31 2023] blk_update_request: I/O error, dev nvme1c1n1, sector 33292288 op 0x0:(READ) flags 0x2000000 phys_seg 1 prio class 0
[Tue Apr 25 06:49:31 2023] nvidia-fs:read IO failed :-5
[Tue Apr 25 06:49:31 2023] nvme nvme1: Failed to map data (-22)
[Tue Apr 25 06:49:31 2023] blk_update_request: I/O error, dev nvme1c1n1, sector 66879488 op 0x0:(READ) flags 0x2000000 phys_seg 1 prio class 0
[Tue Apr 25 06:49:31 2023] nvidia-fs:read IO failed :-5
[Tue Apr 25 06:49:31 2023] blk_update_request: I/O error, dev nvme1c1n1, sector 66863104 op 0x0:(READ) flags 0x2000000 phys_seg 1 prio class 0
[Tue Apr 25 06:49:31 2023] nvidia-fs:read IO failed :-5
[Tue Apr 25 06:49:31 2023] nvidia-fs:read IO failed :-5

cufile.log as follow:
25-04-2023 06:49:32:560 [pid=3061 tid=3070] ERROR 0:1529 IOCTL failed io-type 0 ret -5 expected 4096 gpu_page_offset 4096
25-04-2023 06:49:32:560 [pid=3061 tid=3068] ERROR 0:1529 IOCTL failed io-type 0 ret -5 expected 4096 gpu_page_offset 8192
25-04-2023 06:49:32:560 [pid=3061 tid=3069] ERROR 0:1529 IOCTL failed io-type 0 ret -5 expected 4096 gpu_page_offset 12288
25-04-2023 06:49:32:560 [pid=3061 tid=3067] ERROR 0:1529 IOCTL failed io-type 0 ret -5 expected 4096 gpu_page_offset 0

Thanks,
Regards
spring

for write the errors :
./gdsio -D /test -d 0 -w 4 -s 1G -i 4k -x 0 -I 1 -T 60
Error: IO failed stopping traffic, fd :31 ret:-1 errno :5
io failed :ret :-1 errno :5, file offset :0, block size :4096Error: IO failed stopping traffic, fd :29 ret:-1 errno :5
io failed :ret :-1 errno :5
, file offset :0, block size :4096
Error: IO failed stopping traffic, fd :30 ret:-1 errno :5
io failed :ret :-1 errno :5, file offset :0, block size :4096
Error: IO failed stopping traffic, fd :28 ret:-1 errno :5
io failed :ret :-1 errno :5, file offset :0, block size :4096

dmesg -T
[Tue Apr 25 06:54:30 2023] nvme1c1n1: I/O Cmd(0x1) @ LBA 33292288, 8 blocks, I/O Error (sct 0x0 / sc 0xf) MORE DNR
[Tue Apr 25 06:54:30 2023] nvme1c1n1: I/O Cmd(0x1) @ LBA 66863104, 8 blocks, I/O Error (sct 0x0 / sc 0xf) MORE DNR
[Tue Apr 25 06:54:30 2023] nvme1c1n1: I/O Cmd(0x1) @ LBA 66846720, 8 blocks, I/O Error (sct 0x0 / sc 0xf) MORE DNR
[Tue Apr 25 06:54:30 2023] blk_update_request: I/O error, dev nvme1c1n1, sector 66846720 op 0x1:(WRITE) flags 0x2008800 phys_seg 1 prio class 0
[Tue Apr 25 06:54:30 2023] nvme1c1n1: I/O Cmd(0x1) @ LBA 66879488, 8 blocks, I/O Error (sct 0x0 / sc 0xf) MORE DNR
[Tue Apr 25 06:54:30 2023] blk_update_request: I/O error, dev nvme1c1n1, sector 66879488 op 0x1:(WRITE) flags 0x2008800 phys_seg 1 prio class 0
[Tue Apr 25 06:54:30 2023] nvidia-fs:write IO failed :-5
[Tue Apr 25 06:54:30 2023] nvidia-fs:write IO failed :-5
[Tue Apr 25 06:54:30 2023] blk_update_request: I/O error, dev nvme1c1n1, sector 33292288 op 0x1:(WRITE) flags 0x2008800 phys_seg 1 prio class 0
[Tue Apr 25 06:54:30 2023] blk_update_request: I/O error, dev nvme1c1n1, sector 66863104 op 0x1:(WRITE) flags 0x2008800 phys_seg 1 prio class 0
[Tue Apr 25 06:54:30 2023] nvidia-fs:write IO failed :-5
[Tue Apr 25 06:54:30 2023] nvidia-fs:write IO failed :-5

cufile.log :
25-04-2023 06:54:31:470 [pid=3138 tid=3147] ERROR 0:1529 IOCTL failed io-type 1 ret -5 expected 4096 gpu_page_offset 4096
25-04-2023 06:54:31:470 [pid=3138 tid=3145] ERROR 0:1529 IOCTL failed io-type 1 ret -5 expected 4096 gpu_page_offset 8192
25-04-2023 06:54:31:477 [pid=3138 tid=3146] ERROR 0:1529 IOCTL failed io-type 1 ret -5 expected 4096 gpu_page_offset 12288
25-04-2023 06:54:31:666 [pid=3138 tid=3144] ERROR 0:1529 IOCTL failed io-type 1 ret -5 expected 4096 gpu_page_offset 0

@ kmodukuri
Could you please give me some advices?
Thanks,
Regards
Spring