Hi,
I am testing GPUDirect Storage for local attached NVMe SSDs with H100. My environment is as follows:
$ sudo ./gdscheck -p
GDS release version: 1.11.0.15
nvidia_fs version: 2.22 libcufile version: 2.12
Platform: x86_64
============
ENVIRONMENT:
============
CUFILE_ENV_PATH_JSON : ~/mycufile.json
=====================
DRIVER CONFIGURATION:
=====================
NVMe : Supported
NVMeOF : Supported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
BeeGFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
--Mellanox PeerDirect : Disabled
--rdma library : Not Loaded (libcufile_rdma.so)
--rdma devices : Not configured
--rdma_device_status : Up: 0 Down: 0
=====================
CUFILE CONFIGURATION:
=====================
properties.use_compat_mode : false
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
profile.nvtx : false
profile.cufile_stats : 2
miscellaneous.api_check_aggressive : false
execution.max_io_threads : 4
execution.max_io_queue_depth : 128
execution.parallel_io : true
execution.min_io_threshold_size_kb : 8192
execution.max_request_parallelism : 4
properties.force_odirect_mode : false
properties.prefer_iouring : false
=========
GPU INFO:
=========
GPU index 0 NVIDIA H100 PCIe bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
==============
PLATFORM INFO:
==============
IOMMU: disabled
Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
Cuda Driver Version Installed: 12040
Platform: ProLiant DL380 Gen11, Arch: x86_64(Linux 5.15.0-119-generic)
Platform verification succeeded
PCIe topology looks like this:
+-[0000:ae]-+-00.0 Intel Corporation Device 09a2
| +-00.1 Intel Corporation Device 09a4
| +-00.2 Intel Corporation Device 09a3
| +-00.4 Intel Corporation Device 0b23
| \-01.0-[af]--+-00.0 Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
| \-00.1 Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
+-[0000:9b]-+-00.0 Intel Corporation Device 09a2
| +-00.1 Intel Corporation Device 09a4
| +-00.2 Intel Corporation Device 09a3
| +-00.4 Intel Corporation Device 0b23
| \-01.0-[9c]----00.0 Cray Inc Device 0501
+-[0000:58]-+-00.0 Intel Corporation Device 09a2
| +-00.1 Intel Corporation Device 09a4
| +-00.2 Intel Corporation Device 09a3
| +-00.4 Intel Corporation Device 0b23
| +-01.0-[59]--
| +-03.0-[5a]--
| +-05.0-[5b]--
| \-07.0-[5c-5f]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller PM173Xa
+-[0000:10]-+-00.0 Intel Corporation Device 09a2
| +-00.1 Intel Corporation Device 09a4
| +-00.2 Intel Corporation Device 09a3
| +-00.4 Intel Corporation Device 0b23
| \-01.0-[11]--+-00.0 Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
| \-00.1 Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
+-[0000:88]-+-00.0 Intel Corporation Device 09a2
| +-00.1 Intel Corporation Device 09a4
| +-00.2 Intel Corporation Device 09a3
| +-00.4 Intel Corporation Device 0b23
| \-01.0-[89]----00.0 Mellanox Technologies MT2910 Family [ConnectX-7]
+-[0000:22]-+-00.0 Intel Corporation Device 09a2
| +-00.1 Intel Corporation Device 09a4
| +-00.2 Intel Corporation Device 09a3
| +-00.4 Intel Corporation Device 0b23
| \-01.0-[23]----00.0 NVIDIA Corporation Device 2331
lsmod for nvidia and nvme:
lsmod | grep nvme
nvmet_rdma 57344 0
nvmet 151552 1 nvmet_rdma
nvme_rdma 45056 0
nvme_fabrics 36864 1 nvme_rdma
rdma_cm 122880 3 nvme_rdma,nvmet_rdma,rdma_ucm
ib_core 434176 10 rdma_cm,ib_ipoib,nvme_rdma,nvmet_rdma,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
nvme 57344 2 nvmet,nvmet_rdma
nvme_core 143360 5 nvmet,nvme,nvme_rdma,nvme_fabrics
mlx_compat 69632 17 rdma_cm,ib_ipoib,mlxdevm,nvmet,nvme,nvme_rdma,nvmet_rdma,iw_cm,nvme_core,nvme_fabrics,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
$ lsmod | grep nvidia
nvidia_uvm 4673536 0
nvidia_fs 262144 0
nvidia_drm 94208 0
nvidia_modeset 1495040 1 nvidia_drm
nvidia 8663040 3 nvidia_uvm,nvidia_fs,nvidia_modeset
drm_kms_helper 311296 4 mgag200,nvidia_drm
drm 622592 5 drm_kms_helper,nvidia,mgag200,nvidia_drm
The NVMe SSD is setup as follows:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 63.9M 1 loop /snap/core20/2264
loop1 7:1 0 63.9M 1 loop /snap/core20/2318
loop2 7:2 0 87M 1 loop /snap/lxd/28373
loop3 7:3 0 87M 1 loop /snap/lxd/29351
loop4 7:4 0 38.7M 1 loop /snap/snapd/21465
loop5 7:5 0 38.8M 1 loop /snap/snapd/21759
sda 8:0 0 2.9T 0 disk
sdb 8:16 0 2.9T 0 disk
sdc 8:32 0 2.9T 0 disk
├─sdc1 8:33 0 1G 0 part /boot/efi
└─sdc2 8:34 0 2.9T 0 part /
sdd 8:48 0 1.5T 0 disk
├─sdd1 8:49 0 1G 0 part
├─sdd2 8:50 0 1.4T 0 part
└─sdd3 8:51 0 32G 0 part
sde 8:64 0 2.9T 0 disk
├─sde1 8:65 0 1G 0 part
├─sde2 8:66 0 2.9T 0 part
└─sde3 8:67 0 32G 0 part
nvme0n1 259:2 0 2.9T 0 disk /gds-test
$ mount | grep ext4
/dev/sdc2 on / type ext4 (rw,relatime,stripe=16)
/dev/nvme0n1 on /gds-test type ext4 (rw,relatime,stripe=32,data=ordered)
The gdsio_verify tests and gdsio benchmarks work okay with compatibility mode set to false:
sudo ./gdsio_verify -f /gds-test/gdsio.0 -n 1 -m 0 -s 4k -o 0 -d 0 -t 0
gpu index :0,file :/gds-test/gdsio.0, gpu buffer alignment :0, gpu buffer offset :0, gpu devptr offset :0, file offset :0, io_requested :4096, io_chunk_size :4096, bufregister :true, sync :0, nr ios :1,
fsync :0,
Batch mode: 0
Data Verification Success
However, I see the following ERROR in the cufile.log of the above gdsio_verify test. These errors show up in all gdsio benchmarks:
28-08-2024 17:54:52:150 [pid=33414 tid=33414] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:5c:00.0
28-08-2024 17:54:52:145 [pid=33414 tid=33414] INFO 0:324 Lib being used for urcup concurrency : libcufile_ck
28-08-2024 17:54:52:145 [pid=33414 tid=33414] INFO cufio_core:592 Loaded successfully libcufile_ck.so
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio_core:592 Loaded successfully libmount.so
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio_core:592 Loaded successfully libudev.so
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio_core:596 Using CKIT static library
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:163 nvidia_fs driver open invoked
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:408 GDS release version: 1.11.0.15
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:411 nvidia_fs version: 2.22 libcufile version: 2.12
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:415 Platform: x86_64
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:297 NVMe: driver support OK
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:303 NVMeOF: driver support OK
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:336 WekaFS: driver support OK
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:535 nvidia_fs driver version check ok
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:297 NVMe: driver support OK
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:303 NVMeOF: driver support OK
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:336 WekaFS: driver support OK
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:196 ============
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:197 ENVIRONMENT:
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:198 ============
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:200 CUFILE_ENV_PATH_JSON : /home/user/mycufile.json
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:211 =====================
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:212 DRIVER CONFIGURATION:
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:213 =====================
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:215 NVMe : Supported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:216 NVMeOF : Supported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:217 SCSI : Unsupported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:218 ScaleFlux CSD : Unsupported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:219 NVMesh : Unsupported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:222 DDN EXAScaler : Unsupported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:226 IBM Spectrum Scale : Unsupported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:230 NFS : Unsupported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-drv:233 BeeGFS : Unsupported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] DEBUG cufio-rdma:149 No valid ip addresses specified for RDMA devices. Disabling GDS userspace RDMA access
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-rdma:1131 WekaFS : Unsupported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-rdma:1133 Userspace RDMA : Unsupported
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-rdma:1141 --Mellanox PeerDirect : Disabled
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-rdma:1149 --rdma library : Not Loaded (libcufile_rdma.so)
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-rdma:1152 --rdma devices : Not configured
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio-rdma:1155 --rdma_device_status : Up: 0 Down: 0
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio_core:974 =====================
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio_core:975 CUFILE CONFIGURATION:
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO cufio_core:976 =====================
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1321 properties.use_compat_mode : false
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1323 properties.force_compat_mode : false
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1325 properties.gds_rdma_write_support : true
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1327 properties.use_poll_mode : false
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1329 properties.poll_mode_max_size_kb : 4
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1331 properties.max_batch_io_size : 128
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1333 properties.max_batch_io_timeout_msecs : 5
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1335 properties.max_direct_io_size_kb : 16384
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1337 properties.max_device_cache_size_kb : 131072
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1339 properties.max_device_pinned_mem_size_kb : 33554432
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1341 properties.posix_pool_slab_size_kb : 4 1024 16384
28-08-2024 17:54:52:146 [pid=33414 tid=33414] INFO 0:1343 properties.posix_pool_slab_count : 128 64 32
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1345 properties.rdma_peer_affinity_policy : RoundRobin
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1347 properties.rdma_dynamic_routing : 0
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1354 fs.generic.posix_unaligned_writes : false
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1357 fs.lustre.posix_gds_min_kb: 0
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1371 fs.beegfs.posix_gds_min_kb: 0
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1386 fs.weka.rdma_write_support: false
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1412 fs.gpfs.gds_write_support: false
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1425 profile.nvtx : false
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1427 profile.cufile_stats : 2
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1429 miscellaneous.api_check_aggressive : false
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1439 execution.max_io_threads : 4
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1440 execution.max_io_queue_depth : 128
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1441 execution.parallel_io : true
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1442 execution.min_io_threshold_size_kb : 8192
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1443 execution.max_request_parallelism : 4
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1444 properties.force_odirect_mode : false
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO 0:1446 properties.prefer_iouring : false
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:867 =========
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:868 GPU INFO:
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:869 =========
28-08-2024 17:54:52:147 [pid=33414 tid=33414] DEBUG cufio-plat:377 GPU BDF: 0000:23:00.0
28-08-2024 17:54:52:147 [pid=33414 tid=33414] DEBUG cufio-plat:347 Searching IOMMU entries in /sys/bus/pci/devices/0000:23:00.0/iommu
28-08-2024 17:54:52:147 [pid=33414 tid=33414] DEBUG cufio-plat:386 cuda GPU device attributes: gpu :0 model :NVIDIA H100 PCIe nvdirect :0 numa:-1 pcibridge: bar :1 barBase :35871566856204 barSize :137438953472 streamMemOps :0 dmaBufCapable:1 GDRBufCapable:1 bdf :0 : 35 : 0 : 0
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:449 GPU index 0 NVIDIA H100 PCIe bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:473 Total GPUS supported on this platform 1
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:880 ==============
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:881 PLATFORM INFO:
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:882 ==============
28-08-2024 17:54:52:147 [pid=33414 tid=33414] DEBUG cufio-udev:147 device pci path string : 0000:23:00.0->0000:22:01.0
28-08-2024 17:54:52:147 [pid=33414 tid=33414] DEBUG cufio-plat:683 GPU Dev: 0 numa_node: 0 PCI Group 0000:22:01.0
28-08-2024 17:54:52:147 [pid=33414 tid=33414] DEBUG cufio-plat:156 acs enabled bridge 0000:00:0d.0
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:591 ACS not enabled in GPU paths
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:800 cannot open scsi_mod path, skip scsi check
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:887 use_mq not detected in scsi configuration.cannot support SCSI disks!
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:771 IOMMU: disabled
28-08-2024 17:54:52:147 [pid=33414 tid=33414] INFO cufio-plat:721 Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
28-08-2024 17:54:52:150 [pid=33414 tid=33414] INFO cufio-plat:727 Cuda Driver Version Installed: 12040
28-08-2024 17:54:52:150 [pid=33414 tid=33414] INFO cufio-plat:755 Platform: ProLiant DL380 Gen11, Arch: x86_64(Linux 5.15.0-119-generic)
28-08-2024 17:54:52:150 [pid=33414 tid=33414] INFO cufio-plat:924 Platform verification succeeded
28-08-2024 17:54:52:150 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: nvme path: /sys/devices/pci0000:58/0000:58:07.0/0000:5c:00.0/nvme/nvme0
28-08-2024 17:54:52:150 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: nvme0
28-08-2024 17:54:52:150 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x144d
28-08-2024 17:54:52:150 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:5c:00.0
**28-08-2024 17:54:52:150 [pid=33414 tid=33414] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:5c:00.0**
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: infiniband path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.0/infiniband/mlx5_0
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: mlx5_0
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.0
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: infiniband path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.1/infiniband/mlx5_1
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: mlx5_1
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.1
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: infiniband path: /sys/devices/pci0000:88/0000:88:01.0/0000:89:00.0/infiniband/mlx5_2
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: mlx5_2
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:89:00.0
**28-08-2024 17:54:52:151 [pid=33414 tid=33414] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:5c:00.0**
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_0 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_1 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:151 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_2 device link width: 16 device link speed: 5 ) numa node : 1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.0/net/ens3f0np0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens3f0np0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.1/net/ens3f1np1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens3f1np1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:88/0000:88:01.0/0000:89:00.0/net/ibs5
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ibs5
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:89:00.0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:ae/0000:ae:01.0/0000:af:00.0/net/ens15f0np0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens15f0np0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x14e4
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:af:00.0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:ae/0000:ae:01.0/0000:af:00.1/net/ens15f1np1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens15f1np1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x14e4
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:af:00.1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/virtual/net/lo
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: lo
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:555 vendor id attribute for device not found: lo
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:578 sys attribute for device not found: lo class: net sysattr: device/uevent
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ens3f0np0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_0 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ens3f1np1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_1 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ibs5
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_2 device link width: 16 device link speed: 5 ) numa node : 1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device ens15f0np0 device link width: 8 device link speed: 3 ) numa node : 1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device ens15f1np1 device link width: 8 device link speed: 3 ) numa node : 1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:51 bus-device-function not found in the device attribute : lo
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: infiniband path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.0/infiniband/mlx5_0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: mlx5_0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: infiniband path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.1/infiniband/mlx5_1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: mlx5_1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: infiniband path: /sys/devices/pci0000:88/0000:88:01.0/0000:89:00.0/infiniband/mlx5_2
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: mlx5_2
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:89:00.0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set mlx5_0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_0 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set mlx5_1
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_1 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set mlx5_2
28-08-2024 17:54:52:152 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_2 device link width: 16 device link speed: 5 ) numa node : 1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.0/net/ens3f0np0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens3f0np0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.1/net/ens3f1np1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens3f1np1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:88/0000:88:01.0/0000:89:00.0/net/ibs5
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ibs5
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:89:00.0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:ae/0000:ae:01.0/0000:af:00.0/net/ens15f0np0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens15f0np0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x14e4
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:af:00.0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:ae/0000:ae:01.0/0000:af:00.1/net/ens15f1np1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens15f1np1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x14e4
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:af:00.1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/virtual/net/lo
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: lo
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:555 vendor id attribute for device not found: lo
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-udev:578 sys attribute for device not found: lo class: net sysattr: device/uevent
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ens3f0np0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_0 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ens3f1np1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_1 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ibs5
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_2 device link width: 16 device link speed: 5 ) numa node : 1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ens15f0np0
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device ens15f0np0 device link width: 8 device link speed: 3 ) numa node : 1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ens15f1np1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device ens15f1np1 device link width: 8 device link speed: 3 ) numa node : 1
28-08-2024 17:54:52:153 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:51 bus-device-function not found in the device attribute : lo
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: infiniband path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.0/infiniband/mlx5_0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: mlx5_0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: infiniband path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.1/infiniband/mlx5_1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: mlx5_1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: infiniband path: /sys/devices/pci0000:88/0000:88:01.0/0000:89:00.0/infiniband/mlx5_2
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: mlx5_2
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:89:00.0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set mlx5_0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_0 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set mlx5_1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_1 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set mlx5_2
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_2 device link width: 16 device link speed: 5 ) numa node : 1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.0/net/ens3f0np0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens3f0np0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:10/0000:10:01.0/0000:11:00.1/net/ens3f1np1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens3f1np1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:11:00.1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:88/0000:88:01.0/0000:89:00.0/net/ibs5
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ibs5
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x15b3
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:89:00.0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:ae/0000:ae:01.0/0000:af:00.0/net/ens15f0np0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens15f0np0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x14e4
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:af:00.0
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/pci0000:ae/0000:ae:01.0/0000:af:00.1/net/ens15f1np1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: ens15f1np1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:557 vendor id attribute for device found: 0x14e4
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:583 sys attribute uevent for device found: 0000:af:00.1
28-08-2024 17:54:52:154 [pid=33414 tid=33414] DEBUG cufio-udev:535 scanning sys CLASS: net path: /sys/devices/virtual/net/lo
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-udev:550 sys attribute sysname for device found: lo
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-udev:555 vendor id attribute for device not found: lo
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-udev:578 sys attribute for device not found: lo class: net sysattr: device/uevent
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ens3f0np0
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_0 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ens3f1np1
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_1 device link width: 8 device link speed: 3 ) numa node : 0
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ibs5
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device mlx5_2 device link width: 16 device link speed: 5 ) numa node : 1
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ens15f0np0
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device ens15f0np0 device link width: 8 device link speed: 3 ) numa node : 1
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:75 device name attribute already set ens15f1np1
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:84 adding attributes for device ens15f1np1 device link width: 8 device link speed: 3 ) numa node : 1
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:51 bus-device-function not found in the device attribute : lo
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:273 printing cufile platform topology using nvfs probe:
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:283 gpu 0000:23:00.0 peers : 0000:11:00.0(8519816) 0000:11:00.1(8519816) 0000:5e:00.0(8650900) 0000:89:00.0(16842832) 0000:9c:00.0(16842848) 0000:af:00.0(16842888) 0000:af:00.1(16842888)
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:291 peer 0000:af:00.1 gpus : 0000:23:00.0(16842888)
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:291 peer 0000:af:00.0 gpus : 0000:23:00.0(16842888)
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:291 peer 0000:9c:00.0 gpus : 0000:23:00.0(16842848)
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:291 peer 0000:5e:00.0 gpus : 0000:23:00.0(8650900)
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:291 peer 0000:11:00.0 gpus : 0000:23:00.0(8519816)
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:291 peer 0000:11:00.1 gpus : 0000:23:00.0(8519816)
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-topo-nvfs:291 peer 0000:89:00.0 gpus : 0000:23:00.0(16842832)
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-drv:566 checking GPU attributes 0000:22:01.0
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-drv:595 new group 0000:22:01.0 groupid 0 size 1
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-drv:647 ngpus 1 pos 0
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG 0:294 Bounce buffers initializing... PCI-Groups 1
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG 0:305 Buffer pool initializing for GPU 0 PCI-Group 0
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG 0:227 Buffer pool initialized with 128 slots and priority: default
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG 0:315 Buffer pool setup for GPU 0 with 128 slots Caching enabled: 1 priority: default PCI-Group: 0
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG 0:345 Bounce buffers initialization complete
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-drv:684 PCI Groups initialized
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-px-pool:379 Initializing cufile POSIX pool
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG 0:227 Buffer pool initialized with 128 slots and priority: default
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-px-pool:124 POSIX buffer pool initialized for GPU 0 slab size (KiB): 4 slots: 128
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG 0:227 Buffer pool initialized with 64 slots and priority: default
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-px-pool:124 POSIX buffer pool initialized for GPU 0 slab size (KiB): 1024 slots: 64
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG 0:227 Buffer pool initialized with 32 slots and priority: default
28-08-2024 17:54:52:155 [pid=33414 tid=33414] DEBUG cufio-px-pool:124 POSIX buffer pool initialized for GPU 0 slab size (KiB): 16384 slots: 32
28-08-2024 17:54:52:155 [pid=33414 tid=33414] INFO cufio-px-pool:453 POSIX pool buffer initialization complete
28-08-2024 17:54:52:155 [pid=33414 tid=33414] INFO curdma-ldbal:510 No RDMA devices configured,skipping RDMA load balancer initialization
28-08-2024 17:54:52:357 [pid=33414 tid=33414] DEBUG cufio-stats-plugin:97 stats shared memory segment id: 16 bytes: 31280 shm_addr 0x7f2028a12000
28-08-2024 17:54:52:357 [pid=33414 tid=33414] INFO cufio_core:1041 CUFile initialization complete
28-08-2024 17:54:52:357 [pid=33414 tid=33414] DEBUG cufio:203 cuFileHandleRegister invoked
28-08-2024 17:54:52:362 [pid=33414 tid=33414] DEBUG cufio-udev:94 sysfs attribute found wwid nvme0c0n1
28-08-2024 17:54:52:362 [pid=33414 tid=33414] DEBUG cufio-udev:94 sysfs attribute found device/transport nvme0c0n1
28-08-2024 17:54:52:362 [pid=33414 tid=33414] DEBUG cufio-udev:94 sysfs attribute found model nvme0c0n1
28-08-2024 17:54:52:362 [pid=33414 tid=33414] DEBUG cufio-udev:311 detected nvme model: MO003200KYDNC wwid: eui.37304e30572021100025384500000002 xport: pcie /sys/devices/pci0000:58/0000:58:07.0/0000:5c:00.0/nvme/nvme0/nvme0c0n1
28-08-2024 17:54:52:363 [pid=33414 tid=33414] DEBUG cufio-udev:94 sysfs attribute found integrity/device_is_integrity_capable nvme0n1
28-08-2024 17:54:52:363 [pid=33414 tid=33414] DEBUG cufio-fs:284 block device nvme0n1 drive integrity check capability not present. Ok
28-08-2024 17:54:52:363 [pid=33414 tid=33414] INFO cufio-fs:357 Block dev: /dev/nvme0n1 numa node: -1 pci bridge:
28-08-2024 17:54:52:363 [pid=33414 tid=33414] INFO cufio-udev:99 sysfs attribute not found device/transport nvme0n1
28-08-2024 17:54:52:363 [pid=33414 tid=33414] DEBUG cufio-udev:94 sysfs attribute found wwid nvme0n1
28-08-2024 17:54:52:363 [pid=33414 tid=33414] DEBUG cufio-udev:94 sysfs attribute found queue/logical_block_size nvme0n1
28-08-2024 17:54:52:363 [pid=33414 tid=33414] DEBUG cufio-fs:706 vol pciGroup :
28-08-2024 17:54:52:364 [pid=33414 tid=33414] DEBUG cufio-fs:736 added volume attributes for device: dev_no: 259:2
28-08-2024 17:54:52:364 [pid=33414 tid=33414] DEBUG cufio-fs:676 Found cached Volume Attributes for device: dev_no: 259:2 isDFS: 0
28-08-2024 17:54:52:364 [pid=33414 tid=33414] DEBUG cufio-obj:200 File descriptor 3 is not associated with an XFS file system.
28-08-2024 17:54:52:364 [pid=33414 tid=33414] DEBUG cufio-obj:410 Compatibility Mode: 0 Compat Read Mode: 0 Compat Write Mode: 0 Needs RDMA: 0 Needs Unaligned Access: 0 posix_io_threshold: 0
28-08-2024 17:54:52:364 [pid=33414 tid=33414] DEBUG cufio-obj:416 Needs Kernel RDMA: 0 use_posix_for_unaligned_write: 0 gds batch enabled: 1 Posix retry on -ENOTSUPP: 0
28-08-2024 17:54:52:364 [pid=33414 tid=33414] DEBUG cufio:331 cuFileHandleRegister success
28-08-2024 17:54:52:533 [pid=33414 tid=33414] DEBUG cufio_core:1565 cuFileBufRegister invoked devPtr 0x7f1dc4600000
28-08-2024 17:54:52:533 [pid=33414 tid=33414] DEBUG cufio_core:1409 Got MemType 2 for devPtr: 0x7f1dc4600000 status: 0
28-08-2024 17:54:52:533 [pid=33414 tid=33414] DEBUG cufio_core:1602 minimum chunks needed: 1 old count: 4 chunk len: 4096
28-08-2024 17:54:52:533 [pid=33414 tid=33414] DEBUG cufio-obj:123 mapping nvinfo: 0x158fad0 size: 4096
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG 0:335 map buf 0x7f1dc4600000 Size 4096 sbuf_size 16777216 pin_gpu_memory 1
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG 0:336 map buf 0x7f1dc4600000 bounce-buffer 0 groupId 0
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG 0:1093 Total usage 0 Max Usage 33554432
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG 0:487 MAP gpu index : 0 bdf: 0 35 0 0
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG cufio-obj:133 allocated nvinfo object, sbuf: 0x7f2028a53000
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG 0:175 nvhandle Hash add key 0x158fad0
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG cufio_core:1679 cuFileBufRegister done
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG cufio:203 cuFileHandleRegister invoked
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG cufio-fs:676 Found cached Volume Attributes for device: dev_no: 259:2 isDFS: 0
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG cufio-fs:676 Found cached Volume Attributes for device: dev_no: 259:2 isDFS: 0
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG cufio-obj:200 File descriptor 73 is not associated with an XFS file system.
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG cufio-obj:410 Compatibility Mode: 0 Compat Read Mode: 0 Compat Write Mode: 0 Needs RDMA: 0 Needs Unaligned Access: 0 posix_io_threshold: 0
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG cufio-obj:416 Needs Kernel RDMA: 0 use_posix_for_unaligned_write: 0 gds batch enabled: 1 Posix retry on -ENOTSUPP: 0
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG cufio:331 cuFileHandleRegister success
28-08-2024 17:54:52:534 [pid=33414 tid=33414] DEBUG cufio:628 cuFileReadAsync invoked
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:170 Hash Lookup nvinfo 0x158fad0 key 0x7f1dc4600000
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG cufio_core:753 io in progress = 0 nvinfo: 0x158fad0
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG cufio_core:2698 gds path taken with ODIRECT fd: 3
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:461 current cuda context present
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:472 Allocate buffer of size 1048576 on GPU 0 PCI-Group 0
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:389 Bounce buffer 139765827502080 GPU page aligned
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:481 Buffer from aligned alloc, dptr 0x7f1dc4800000 aligned_dptr 0x7f1dc4800000 size 1048576
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:335 map buf 0x7f1dc4800000 Size 1048576 sbuf_size 1048576 pin_gpu_memory 1
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:336 map buf 0x7f1dc4800000 bounce-buffer 1 groupId 0
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:1093 Total usage 4 Max Usage 33554432
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:487 MAP gpu index : 0 bdf: 0 35 0 0
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:507 Buffer allocation and map success on GPU: 0
28-08-2024 17:54:52:535 [pid=33414 tid=33414] DEBUG 0:842 Bounce-buffer allocated from PCI-Group: 0 GPU: 0
28-08-2024 17:54:52:536 [pid=33414 tid=33414] DEBUG cufio-px:144 current cuda context present
28-08-2024 17:54:52:537 [pid=33414 tid=33414] DEBUG cufio-px:174 Allocated buffer of size 8192 at 0x15b6000
28-08-2024 17:54:52:537 [pid=33414 tid=33414] DEBUG cufio_core:2288 cuFile allocated buffer for posix read/write 8192
28-08-2024 17:54:52:538 [pid=33414 tid=33414] DEBUG cufio_async:1054 sinfo: 0x15a2d10 created posix fd: 75
28-08-2024 17:54:52:538 [pid=33414 tid=33414] DEBUG cufio_async:560 async IO sinfo 0x15a2d10 op READ file args: fd,inum,gen( 3 12 3377307348 ) maj:min( 259 2
28-08-2024 17:54:52:539 [pid=33414 tid=33414] DEBUG cufio_async:882 sinfo 0x15a2d10 nr_ios := 1 index: 0 last_seq: 1 current_seq 0
28-08-2024 17:54:52:540 [pid=33414 tid=33426] DEBUG 0:396 async IO submit read 0x7f1e182f9000 0 at offset: 0 size: 4096 end-fence-value: 1 remaining_size: 0
28-08-2024 17:54:52:540 [pid=33414 tid=33426] DEBUG 0:411 async IO submit sinfo 0x15a2d10 stream 0x151a000 read on fd 3 done
28-08-2024 17:54:52:541 [pid=33414 tid=33426] DEBUG 0:182 async IO op 0x15a2d10 end_fence: 1 verifying seq: 1 / 1
28-08-2024 17:54:52:541 [pid=33414 tid=33426] DEBUG 0:226 async IO op 0x15a2d10 seq 1 / 1 completed 4096
28-08-2024 17:54:52:541 [pid=33414 tid=33426] DEBUG 0:238 async IO op 0x15a2d10 size 4096 completed result 4096
28-08-2024 17:54:52:545 [pid=33414 tid=33414] DEBUG cufio:745 cuFileWriteAsync invoked
28-08-2024 17:54:52:545 [pid=33414 tid=33414] DEBUG 0:170 Hash Lookup nvinfo 0x158fad0 key 0x7f1dc4600000
28-08-2024 17:54:52:545 [pid=33414 tid=33414] DEBUG cufio_core:753 io in progress = 0 nvinfo: 0x158fad0
28-08-2024 17:54:52:545 [pid=33414 tid=33414] DEBUG cufio_core:2698 gds path taken with ODIRECT fd: 73
28-08-2024 17:54:52:545 [pid=33414 tid=33414] DEBUG cufio_async:1054 sinfo: 0x15a2d10 created posix fd: 76
28-08-2024 17:54:52:545 [pid=33414 tid=33414] DEBUG cufio_async:560 async IO sinfo 0x15a2d10 op WRITE file args: fd,inum,gen( 73 21 867027462 ) maj:min( 259 2
28-08-2024 17:54:52:545 [pid=33414 tid=33414] DEBUG cufio_async:882 sinfo 0x15a2d10 nr_ios := 1 index: 0 last_seq: 2 current_seq 1
28-08-2024 17:54:52:545 [pid=33414 tid=33426] DEBUG 0:396 async IO submit write 0x7f1e182f9000 0 at offset: 0 size: 4096 end-fence-value: 2 remaining_size: 0
28-08-2024 17:54:52:546 [pid=33414 tid=33426] DEBUG 0:411 async IO submit sinfo 0x15a2d10 stream 0x151a000 write on fd 73 done
28-08-2024 17:54:52:547 [pid=33414 tid=33426] DEBUG 0:182 async IO op 0x15a2d10 end_fence: 2 verifying seq: 2 / 2
28-08-2024 17:54:52:547 [pid=33414 tid=33426] DEBUG 0:226 async IO op 0x15a2d10 seq 2 / 2 completed 4096
28-08-2024 17:54:52:547 [pid=33414 tid=33426] DEBUG 0:238 async IO op 0x15a2d10 size 4096 completed result 4096
28-08-2024 17:54:52:547 [pid=33414 tid=33426] DEBUG 0:104 async IO sinfo 0x15a2d10 done: 4096
28-08-2024 17:54:52:547 [pid=33414 tid=33414] DEBUG cufio:356 cuFileHandleDeregister invoked
28-08-2024 17:54:52:547 [pid=33414 tid=33414] DEBUG cufio:383 cuFileHandleDeregister done
28-08-2024 17:54:52:547 [pid=33414 tid=33414] DEBUG cufio:95 cuFileBufDeregister invoked
28-08-2024 17:54:52:547 [pid=33414 tid=33414] DEBUG cufio:110 Deregistering devptr: 0x7f1dc4600000
28-08-2024 17:54:52:547 [pid=33414 tid=33414] DEBUG 0:170 Hash Lookup nvinfo 0x158fad0 key 0x7f1dc4600000
28-08-2024 17:54:52:547 [pid=33414 tid=33414] DEBUG cufio_core:753 io in progress = 0 nvinfo: 0x158fad0
28-08-2024 17:54:52:547 [pid=33414 tid=33414] DEBUG 0:232 nvHandlehash remove nvinfo key: 0x158fad0
28-08-2024 17:54:52:547 [pid=33414 tid=33414] DEBUG 0:209 cufio-internal - free buf 0x158fbc0
28-08-2024 17:54:52:547 [pid=33414 tid=33414] DEBUG 0:240 unmap GPU 0
28-08-2024 17:54:52:566 [pid=33414 tid=33414] DEBUG cufio:164 cuFileBufDeregister done
28-08-2024 17:54:52:566 [pid=33414 tid=33414] DEBUG cufio:356 cuFileHandleDeregister invoked
28-08-2024 17:54:52:566 [pid=33414 tid=33414] DEBUG cufio:383 cuFileHandleDeregister done
28-08-2024 17:54:52:566 [pid=33414 tid=33414] DEBUG 0:971 Freeing nvbuf: 0x15a44d0 io_in_progress 1
28-08-2024 17:54:52:567 [pid=33414 tid=33414] DEBUG cufio-px:63 freed pointer nvbuf: 0x15a8b20
28-08-2024 17:54:52:650 [pid=33414 tid=33414] INFO cufio_core:118 cuFile STATS VERSION : 8
GLOBAL STATS:
Read: ok = 2 err = 0
Write: ok = 2 err = 0
HandleRegister: ok = 2 err = 0
HandleDeregister: ok = 2 err = 0
BufRegister: ok = 1 err = 0
BufDeregister: ok = 1 err = 0
BatchSubmit: ok = 0 err = 0
BatchComplete: ok = 0 err = 0
BatchSetup: ok = 0 err = 0
BatchCancel: ok = 0 err = 0
BatchDestroy: ok = 0 err = 0
BatchEnqueued: ok = 0 err = 0
PosixBatchEnqueued: ok = 0 err = 0
BatchProcessed: ok = 0 err = 0
PosixBatchProcessed: ok = 0 err = 0
Total Read Size (MiB): 0
Read BandWidth (GiB/s): 0
Avg Read Latency (us): 0
Total Write Size (MiB): 0
Write BandWidth (GiB/s): 0
Avg Write Latency (us): 0
Total Batch Read Size (MiB): 0
Total Batch Write Size (MiB): 0
Batch Read BandWidth (GiB/s): 0
Batch Write BandWidth (GiB/s): 0
Avg Batch Submit Latency (us): 0
Avg Batch Completion Latency (us): 0
READ-WRITE SIZE HISTOGRAM :
0-4(KiB): 0 0
4-8(KiB): 2 2
8-16(KiB): 0 0
16-32(KiB): 0 0
32-64(KiB): 0 0
64-128(KiB): 0 0
128-256(KiB): 0 0
256-512(KiB): 0 0
512-1024(KiB): 0 0
1024-2048(KiB): 0 0
2048-4096(KiB): 0 0
4096-8192(KiB): 0 0
8192-16384(KiB): 0 0
16384-32768(KiB): 0 0
32768-65536(KiB): 0 0
65536-...(KiB): 0 0
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG cufio-px-pool:473 Releasing POSIX pool size: 4096 for GPU: 0
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG 0:145 Tearing down pci-info with 1 GPUs
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG cufio-px-pool:473 Releasing POSIX pool size: 1048576 for GPU: 0
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG 0:145 Tearing down pci-info with 1 GPUs
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG cufio-px-pool:473 Releasing POSIX pool size: 16777216 for GPU: 0
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG 0:145 Tearing down pci-info with 1 GPUs
28-08-2024 17:54:52:650 [pid=33414 tid=33414] INFO cufio-px-pool:484 POSIX pool buffer release complete
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG 0:179 Tearing down bounce buffers
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG 0:145 Tearing down pci-info with 1 GPUs
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG 0:103 Tearing down buffers from GPU 0
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG 0:111 free buffers 128
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG 0:209 cufio-internal - free buf 0x15a44d0
28-08-2024 17:54:52:650 [pid=33414 tid=33414] DEBUG 0:240 unmap GPU 0
28-08-2024 17:54:54:688 [pid=33414 tid=33414] INFO 0:136 nvidia_fs driver closed
And I see the following in the dmesg log:
[69036.183347] nvidia-fs:nvfs_pci: no hash entry for pdevinfo:0000:5c:00.0
[69036.189129] nvidia-fs:nvfs_pci: no hash entry for pdevinfo:0000:5c:00.0
Can someone help with explaining what the ERROR messages mean and what does the dmesg log indicate? Also, how do I confirm that gdsio benchmarks are using P2P DMA?