MCX354A-QCBT both ports connected to a different host respectively. iblinkinfo return with time out and lists only the connections for the first port.

MCX354A-QCBT both ports connected to a different host respectively.

iblinkinfo returns with time out and lists only the connections for the first port.

Operation system: Centos 7

opensm 3.3.21

$ iblinkinfo

src/query_smp.c:195; umad (DR path slid 0; dlid 0; 0,1 Attr 0xff90:1) bad status 110; Connection timed out

CA: rcc HCA-1:

0xe41d2d030047c3a1 2 1 ==( 4X 10.0 Gbps Active/ LinkUp)==> 1 1 “xct-eds mlx4_0” ( )

CA: xct-eds mlx4_0:

0x506b4b03004e4c01 1 1 ==( 4X 10.0 Gbps Active/ LinkUp)==> 2 1 “rcc HCA-1” ( )

Hello Maik,

Thank you for posting your inquiry on the NVIDIA Networking Community.

Based on the information provided, that is normal behavior for the command ‘iblinkinfo’

To show the second port, use the following syntax → # iblinkinfo -C mlx4_0 -P 2

Thank you and regards,

~NVIDIA Networking Technical Support

Hello Martijn

Thanks a lot for quick answering.

Our scenario is this:

We produce systems with three computers in it.

COMPUTER1 - - COMPUTER2 = = COMPUTER3

    • is the first IB connection

= = is the second IB connection

We use IPoIB

My problem is that I don’t understand:

How do both subnet managers on my COMPUTER2 know that they are in different Subnets?

Like in an IP network, I would expect a configuration or a routing table

In our case COMPUTER1 and COMPUTER3 do not communicate. But could they? Could COMPUTER2 act as Router?

I investigate this because sometimes the connection between COMPUTER2 and COMPUTER3 doesn’t come up.

In the log we see ib0 comes up

Aug 31 12:17:13 localhost kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready

whereas ib1 stays in not ready

Aug 31 12:17:13 localhost kernel: IPv6: ADDRCONF(NETDEV_UP): ib1: link is not ready

While exanimation IB I used IB tools like ibnetdiscover, ibhosts ….

I saw On COMPUTER2 they dumped only information about the first connenction and ended with a timeout.

( For example [root@COMPUTER2]$ ibhosts

src/query_smp.c:195; umad (DR path slid 0; dlid 0; 0,1 Attr 0xff90:1) bad status 110; Connection timed out

Ca : 0xe41d2d030047c3a0 ports 2 " COMPUTER1"

Ca : 0x506b4b03004e4c00 ports 2 " COMPUTER2" )

I tried to use the GUIDs of each port:

Fore example:

With the GUID of the first port everything is fine.

[root@xct-eds hitrax]$ ibaddr -G 0x506b4b03004e4c01

GID fe80::506b:4b03:4e:4c01 LID start 0x1 end 0x1

With the GUID of the second port we fail.

[root@xct-eds hitrax]$ ibaddr -G 0x506b4b03004e4c02

ibwarn: [107057] ib_path_query_via: sa call path_query failed

ibaddr: iberror: failed: can’t resolve destination port 0x506b4b03004e4c02

In the opensm.log I do see similar timeouts:

Sep 08 05:23:50 069680 [9D0EA700] 0x01 → mcmr_rcv_join_mgrp: ERR 1B11: Port 0xe41d2d030047c3a1 (rcc HCA-1) failed to join non-existing multicast group with MGID ff12:601b:ffff::2, insufficient components specified for implicit create (comp_mask 0x10083)

Sep 08 05:23:52 579650 [9D0EA700] 0x01 → mcmr_rcv_join_mgrp: ERR 1B11: Port 0xe41d2d030047c3a1 (rcc HCA-1) failed to join non-existing multicast group with MGID ff12:601b:ffff::16, insufficient components specified for implicit create (comp_mask 0x10083)

Sep 08 05:23:54 069575 [9D0EA700] 0x01 → mcmr_rcv_join_mgrp: ERR 1B11: Port 0xe41d2d030047c3a1 (rcc HCA-1) failed to join non-existing multicast group with MGID ff12:601b:ffff::2, insufficient components specified for implicit create (comp_mask 0x10083)

Sep 08 05:23:56 805946 [9B0E6700] 0x01 → log_send_error: ERR 5411: DR SMP Send completed with error (IB_TIMEOUT) – dropping

Method 0x1, Attr 0xFF90, TID 0x1256

Sep 08 05:23:56 805990 [9B0E6700] 0x01 → Received SMP on a 1 hop path: Initial path = 0,1, Return path = 0,0

Sep 08 05:23:56 806006 [9B0E6700] 0x01 → sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT): SubnGet(MLNXExtendedPortInfo), attr_mod 0x1, TID 0x1256

Sep 08 05:23:56 806018 [9B0E6700] 0x01 → sm_mad_ctrl_send_err_cb: ERR 3120: Timeout while getting attribute 0xFF90 (MLNXExtendedPortInfo); Possible mis-set mkey?

Sep 08 05:23:56 806732 [9C0E8700] 0x02 → SUBNET UP

Sep 08 05:23:58 069294 [9D0EA700] 0x01 → mcmr_rcv_join_mgrp: ERR 1B11: Port 0xe41d2d030047c3a1 (rcc HCA-1) failed to join non-existing multicast group with MGID ff12:601b:ffff::2, insufficient components specified for implicit create (comp_mask 0x10083)

Sep 08 05:24:06 805934 [9B0E6700] 0x01 → log_send_error: ERR 5411: DR SMP Send completed with error (IB_TIMEOUT) – dropping

Method 0x1, Attr 0xFF90, TID 0x125c

Sep 08 05:24:06 805983 [9B0E6700] 0x01 → Received SMP on a 1 hop path: Initial path = 0,1, Return path = 0,0

Sep 08 05:24:06 806001 [9B0E6700] 0x01 → sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT): SubnGet(MLNXExtendedPortInfo), attr_mod 0x1, TID 0x125c

Sep 08 05:24:06 806012 [9B0E6700] 0x01 → sm_mad_ctrl_send_err_cb: ERR 3120: Timeout while getting attribute 0xFF90 (MLNXExtendedPortInfo); Possible mis-set mkey?

Sep 08 05:24:06 806968 [9C0E8700] 0x02 → SUBNET UP

These timeouts seem not to cause problems in most cases. But to me it seems that our computers are bad configured.

And every fiftieth machine we do produce has the problem that the second port on COMPUTER2(ib1) does not come up. Normally we simply do exchange the computers.

Thus my first idea was: That we could have a kind of routing problem which affects MAD(Management Datagram) . Thus the subnet manager is confused

The really important question to understand is, how the both Subnet Mangers on COMPUTER2 now that they are in different Subnets?

Many Thanks

Maik

Hello Maik,

You need to create two opensm.conf files and change the guid to respective port1 and port2.

Then you need to start two opensm processes with the option -F <path to opensm.conf for port1|2>

In practice you will run two SM’s running pointing to different configuration files which contain the GUID for port 1 or port 2, to create two IB fabrics:

  • Node1 <-> Node2
  • Node2 <-> Node3

Example snip opensm.conf port 1

DEVICE ATTRIBUTES OPTIONS

The port GUID on which the OpenSM is running.

guid 0x98039b0300921fce ← Port GUID of port1

Example snip opensm.conf port 2

DEVICE ATTRIBUTES OPTIONS

The port GUID on which the OpenSM is running.

guid 0x98039b0300921fcf ← Port GUID of port2

To start the two SM’s in the background as daemons:

/usr/sbin/opensm --config -q local

/usr/sbin/opensm --config -q local

Note: This will not give you the ability to route between the two IB networks. They are isolated. Routing between the two fabrics is not possible with this solution.

The easiest solution is to get an unmanaged IB switch so you can connect all nodes to the switch, run SM on one node and you will have full connectivity between all nodes as part of one IB fabric.

Thank you and regards,

~NVIDIA Networking Technical Support

Hello Madtijn,

thanks for your explanation. To know that the two ports have isolated fabrics helps me understanding.

I do not need routing in effect.

I wrote my question with the intention to understand how the two subnet manager processes (on the same machine) do cooperate. As I understand now they don’t cooperate at all. They are isolated.

We already have two subnet mangers running.

We have only one opensm.conf and defined two GUIDS

in /etc/sysconfig/opensm. I think that should do the same as the solution which you suggested?

As I said, out solution normally works fine.

I just saw the ‘bad status 110; Connection timed out’ messages and the ‘Sep 08 05:23:56 805946 [9B0E6700] 0x01 → log_send_error: ERR 5411: DR SMP Send completed with error (IB_TIMEOUT) – dropping’ entries in opensm.log which irritated me.

But some times IPoIP does not come up. On lower level everything seems to be okay - ibstat shows State: Active and Physical state: LinkUp.

In case of error I found in /var/log/messages

kernel: ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22

Thank you very much

Maik Skronn

Hello Maik,

It really needs to be two separate opensm.conf files and process. Running two processes on the same file does not work.

As I suggested, use the recommendation provided. That is solid proven solution. We are using it in our labs all the time.

Thank you,

~Martjn

Hi Martijn,

thanks.

The solutions we used came from the documentation and default config files of opensm 3.3.21.

If you are interested in, have a look at /etc/sysconfig/opensm which comes with opensm-3.3.21-2.el7.x86_64.rpm.

Anyway we did change it into to config files. Because it’s better to have two log files.

The result is the same. Usually all works fine.

But sometimes the IPoIB connection doesn’t come up.

The symptom is:

-normal ping doesn’t work

-ibping does work fine.

If we unplug the cable and reconnect the cable then all is fine again. Although we didn’t do anything else, like reboot.

Perhaps there is a kind of reboot or keep alive option for IPoIB we could use?

Or can we do a reconnect by software if we detect an error?

Best regards

Maik

Hi Maik,

As the OpenSM you are using is provided by the OS distribution, support for this component will be provided by the OS-vendor. With MLNX_OFED we also provide an OpenSM. This OpenSM component is maintained and supported by NVIDIA Networking.

For the IPoIB issue regarding not being able to ping, that can be related to using the same IP subnet on the two different IB fabrics.

When the interfaces on the node belong to the same subnet, it can lead to unexpected network behavior as Linux host in such case might respond to incoming packets via different interface.

In order to avoid such situation, it is possible to use advanced routing and rules to guarantee that the Linux kernel will not accept traffic on interfaces if destination IP is not defined on incoming interface and also outgoing traffic will be going via interface what have IP address that is specified is source IP in the packet.

In order to avoid such situation, it is possible to use advanced routing and rules to guarantee that the Linux kernel will not accept traffic on interfaces if destination IP is not defined on incoming interface and also outgoing traffic will be going via interface what have IP address that is specified is source IP in the packet.

The following RHEL KB will provide a guide to resolution for this issue → https://access.redhat.com/solutions/30564

Thank you,

~Martijn