linux-rdma perftest ib_read_bw failure with use_cuda option

jseong82.kim · June 17, 2021, 8:16am

I’d like to measure GPU-GPU ib_read / write bandwidth with varying combination of our PCIe topology within a single node.

But, --use_cuda option doesn’t work properly.

Could you give me help for going further step to measure bandwidth?

I want to know whether my test command is correct or not as well.
With -d and --use_cuda option, can I test all combination of GPU - HCA (Server) - HCA - GPU (Client) data communication?

Software Info

OS : ubuntu 20.04
linux-rdma/perftest (tag: v4.5-0.2)
NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2
MLNX_OFED_LINUX-5.3-1.0.0.1 (OFED-5.3-1.0.0)

ib_read_bw

Server

$ ./ib_read_bw -d mlx5_1 --use_cuda=0 -p 50001

Waiting for client to connect… *

initializing CUDA

Listing all CUDA devices in system:

RDMA_Read BW Test

Dual-port : OFF Device : mlx5_1

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

PCIe relax order: ON

ibv_wr* API : ON

CQ Moderation : 1

Mtu : 4096[B]

Link type : IB

Outstand reads : 16

rdma_cm QPs : OFF

Data ex. method : Ethernet

local address: LID 0x04 QPN 0x1854 PSN 0xd2b3ee OUT 0x10 RKey 0x00277e VAddr 0x007faec3210000

remote address: LID 0x03 QPN 0x1000 PSN 0xdfe56e OUT 0x10 RKey 0x002467 VAddr 0x007f5643210000

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

ethernet_read_keys: Couldn’t read remote address

Unable to read to socket/rdma_cm

Failed to exchange data between server and clients

Client

$ ./ib_read_bw 127.0.0.1 -d mlx5_2 --use_cuda=7 -p 50001

initializing CUDA

Listing all CUDA devices in system:

RDMA_Read BW Test

Dual-port : OFF Device : mlx5_2

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

PCIe relax order: ON

ibv_wr* API : ON

TX depth : 128

CQ Moderation : 1

Mtu : 4096[B]

Link type : IB

Outstand reads : 16

rdma_cm QPs : OFF

Data ex. method : Ethernet

local address: LID 0x03 QPN 0x1000 PSN 0xdfe56e OUT 0x10 RKey 0x002467 VAddr 0x007f5643210000

remote address: LID 0x04 QPN 0x1854 PSN 0xd2b3ee OUT 0x10 RKey 0x00277e VAddr 0x007faec3210000

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

mlx5: ai004: got completion with error:

00000000 00000000 00000000 00000000

00000000 00008914 10001000 0000b0d2

Completion with error at client

Failed status 11: wr_id 0 syndrom 0x89

scnt=128, ccnt=0

Failed to complete run_iter_bw function successfully

$ nvidia-smi topo -m

GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_0 mlx5_1 mlx5_2 mlx5_3 mlx5_4 mlx5_5 CPU Affinity NUMA Affinity

GPU0 X NV12 NV12 NV12 NV12 NV12 NV12 NV12 SYS PXB SYS SYS SYS SYS 48-63,176-191 3

GPU1 NV12 X NV12 NV12 NV12 NV12 NV12 NV12 SYS PXB SYS SYS SYS SYS 48-63,176-191 3

GPU2 NV12 NV12 X NV12 NV12 NV12 NV12 NV12 PXB SYS SYS SYS SYS SYS 16-31,144-159 1

GPU3 NV12 NV12 NV12 X NV12 NV12 NV12 NV12 PXB SYS SYS SYS SYS SYS 16-31,144-159 1

GPU4 NV12 NV12 NV12 NV12 X NV12 NV12 NV12 SYS SYS SYS SYS SYS PXB 112-127,240-255 7

GPU5 NV12 NV12 NV12 NV12 NV12 X NV12 NV12 SYS SYS SYS SYS SYS PXB 112-127,240-255 7

GPU6 NV12 NV12 NV12 NV12 NV12 NV12 X NV12 SYS SYS PXB SYS SYS SYS 80-95,208-223 5

GPU7 NV12 NV12 NV12 NV12 NV12 NV12 NV12 X SYS SYS PXB SYS SYS SYS 80-95,208-223 5

mlx5_0 SYS SYS PXB PXB SYS SYS SYS SYS X SYS SYS SYS SYS SYS

mlx5_1 PXB PXB SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS

mlx5_2 SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS X SYS SYS SYS

mlx5_3 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS X PIX SYS

mlx5_4 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS PIX X SYS

mlx5_5 SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS SYS X

Topic		Replies	Views
The bandwidth used to test the code is not the same as that tested by perftest Network Management Products rdma-and-roce , infiniband , rdmaroce	0	191	June 5, 2024
RDMA GPUDirect//nvidia-peer-memory/cuda issue RDMA Software For GPU software-and-drivers , howto-enable-verify-and-troubleshoo	11	1987	September 12, 2019
"--use_cuda_dmabuf" is not supported on this GPU RDMA Software For GPU	4	2116	July 31, 2023
Having issues getting host gpu to host gpu RDMA to work CUDA Programming and Performance	2	1842	July 17, 2019
P2p Bandwidth 150% higher than maximum achievable CUDA Programming and Performance cuda , ubuntu	10	2713	April 11, 2023
Concurrent bandwidth test CUDA Programming and Performance	30	46315	April 27, 2012
ib_send_bw ignores '-d 'option on client side,why? Software And Drivers iterations , bytes , tx	1	1043	December 23, 2018
Benchmarking GPUDirect RDMA on Modern Server Platforms Technical Blog	40	2748	April 11, 2019
Ib_write_bw fails at sending 65536 Bytes when connection type is set to RC by default Mellanox OFED	6	3313	February 27, 2023
bandwidthtest fails--now what? CUDA Setup and Installation	2	2069	November 6, 2017

linux-rdma perftest ib_read_bw failure with use_cuda option

Related topics