Can't ibping Lid or GUID but can ping by ip

We are using an SB7790 unmanaged switch connected to:

  1. VMWARE (6.5) server with opensm on a guest Centos VM (7.5) - Mellanox ConnectX-4
  2. Server with Ubuntu (16.04.5 LTS) - Mellanox ConnectX-4
  3. Have all updated

Successful items:

  • Opensm is running (active) from Centos VM
  • ibstat finds all interfaces with active and linkup.
  • ibnetworkdiscover finds all interfaces connected
  • We can ping by ip to and from each server

Unsuccessful item:

  • Not able to ibping across switch

We’re not sure what we might be missing.

Can’t find many resources to do more troubleshooting. Anyone that could help would be greatly appreciated!

Thanks

Brian

Hi Brian,

When using virtualization, GRH (global routing header) must be present in the packet. For ibping, --dgid parameter need to be used (see man ibping).

To get GIDs, on the server run ‘show_gids’ and use the output on the client side

Server

#show_gids

DEV PORT INDEX GID IPv4 VER DEV


mlx5_1 1 0 fe80:0000:0000:0000:248a:0703:009c:01a7 v1

Client

#ibping --dgid fe80:0000:0000:0000:248a:0703:009c:01a7 18

If you like to check RDMA connectivity between VMs, use utilities from perftest package (ib_read_bw, ib_write_bw, etc) with -R parameter.

ibtracert works

We actually have connection but we are only able to ibping to the GUID that is binded on OpenSM but can’t ibping to the other GUIDs now.

Is there any error messages? Does ibtracert work (#ibtracert ?

Hi Brian,

Did you start ibping on the server side using different device? It is ‘-C’ option.

Thank you for responding quickly.

I am able to ibping to the gid on first dev but not on the second one:

SERVER:


show_gids

DEV PORT INDEX GID IPv4 VER DEV


mlx5_0 1 0 fe80:0000:0000:0000:248a:0703:0014:f9ac v1

mlx5_1 1 0 fe80:0000:0000:0000:248a:0703:0014:f850 v1

n_gids_found=2

CLIENT:

name@server:/etc/infiniband$ ibping --dgid fe80:0000:0000:0000:248a:0703:0014:f9ac 8

Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.109 ms

Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.095 ms

Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.139 ms

Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.174 ms

Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.159 ms

Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.190 ms

Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.169 ms

Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.163 ms

^Z[6] Killed ibping 8

[7] Killed ibping -S

[8]+ Stopped ibping --dgid fe80:0000:0000:0000:248a:0703:0014:f9ac 8

name@server:/etc/infiniband$ ibping --dgid fe80:0000:0000:0000:248a:0703:0014:f850 8

ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

^Z

[9]+ Stopped ibping --dgid fe80:0000:0000:0000:248a:0703:0014:f850 8

How can ibping the other gids?

Thanks

Brian