linux-rdma perftest ib_read_bw failure with use_cuda option

I’d like to measure GPU-GPU ib_read / write bandwidth with varying combination of our PCIe topology within a single node.

But, --use_cuda option doesn’t work properly.

Could you give me help for going further step to measure bandwidth?

  • I want to know whether my test command is correct or not as well.
  • With -d and --use_cuda option, can I test all combination of GPU - HCA (Server) - HCA - GPU (Client) data communication?

Software Info

  • OS : ubuntu 20.04
  • linux-rdma/perftest (tag: v4.5-0.2)
  • NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2
  • MLNX_OFED_LINUX-5.3-1.0.0.1 (OFED-5.3-1.0.0)

ib_read_bw

Server

$ ./ib_read_bw -d mlx5_1 --use_cuda=0 -p 50001


  • Waiting for client to connect… *

initializing CUDA

Listing all CUDA devices in system:


RDMA_Read BW Test

Dual-port : OFF Device : mlx5_1

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

PCIe relax order: ON

ibv_wr* API : ON

CQ Moderation : 1

Mtu : 4096[B]

Link type : IB

Outstand reads : 16

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0x04 QPN 0x1854 PSN 0xd2b3ee OUT 0x10 RKey 0x00277e VAddr 0x007faec3210000

remote address: LID 0x03 QPN 0x1000 PSN 0xdfe56e OUT 0x10 RKey 0x002467 VAddr 0x007f5643210000


#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

ethernet_read_keys: Couldn’t read remote address

Unable to read to socket/rdma_cm

Failed to exchange data between server and clients

Client

$ ./ib_read_bw 127.0.0.1 -d mlx5_2 --use_cuda=7 -p 50001

initializing CUDA

Listing all CUDA devices in system:


RDMA_Read BW Test

Dual-port : OFF Device : mlx5_2

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

PCIe relax order: ON

ibv_wr* API : ON

TX depth : 128

CQ Moderation : 1

Mtu : 4096[B]

Link type : IB

Outstand reads : 16

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0x03 QPN 0x1000 PSN 0xdfe56e OUT 0x10 RKey 0x002467 VAddr 0x007f5643210000

remote address: LID 0x04 QPN 0x1854 PSN 0xd2b3ee OUT 0x10 RKey 0x00277e VAddr 0x007faec3210000


#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

mlx5: ai004: got completion with error:

00000000 00000000 00000000 00000000

00000000 00000000 00000000 00000000

00000000 00000000 00000000 00000000

00000000 00008914 10001000 0000b0d2

Completion with error at client

Failed status 11: wr_id 0 syndrom 0x89

scnt=128, ccnt=0

Failed to complete run_iter_bw function successfully

$ nvidia-smi topo -m

GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_0 mlx5_1 mlx5_2 mlx5_3 mlx5_4 mlx5_5 CPU Affinity NUMA Affinity

GPU0 X NV12 NV12 NV12 NV12 NV12 NV12 NV12 SYS PXB SYS SYS SYS SYS 48-63,176-191 3

GPU1 NV12 X NV12 NV12 NV12 NV12 NV12 NV12 SYS PXB SYS SYS SYS SYS 48-63,176-191 3

GPU2 NV12 NV12 X NV12 NV12 NV12 NV12 NV12 PXB SYS SYS SYS SYS SYS 16-31,144-159 1

GPU3 NV12 NV12 NV12 X NV12 NV12 NV12 NV12 PXB SYS SYS SYS SYS SYS 16-31,144-159 1

GPU4 NV12 NV12 NV12 NV12 X NV12 NV12 NV12 SYS SYS SYS SYS SYS PXB 112-127,240-255 7

GPU5 NV12 NV12 NV12 NV12 NV12 X NV12 NV12 SYS SYS SYS SYS SYS PXB 112-127,240-255 7

GPU6 NV12 NV12 NV12 NV12 NV12 NV12 X NV12 SYS SYS PXB SYS SYS SYS 80-95,208-223 5

GPU7 NV12 NV12 NV12 NV12 NV12 NV12 NV12 X SYS SYS PXB SYS SYS SYS 80-95,208-223 5

mlx5_0 SYS SYS PXB PXB SYS SYS SYS SYS X SYS SYS SYS SYS SYS

mlx5_1 PXB PXB SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS

mlx5_2 SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS X SYS SYS SYS

mlx5_3 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS X PIX SYS

mlx5_4 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS PIX X SYS

mlx5_5 SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS SYS X