Newbie question to infiniband: Stuck with IS5022Q shutting off during initialization

Hello,

I just acquired the following [used] switch:

P/N MIS5022Q-1BFR Rev:A5

Within a few seconds from plugging in and showing green for both system and fan LEDs, the switch shuts itself off, leaving only the fans running. It will repeat that when unplug, rest 5-minutes, and replug, with or without port cablings. Since I can’t get a stable inband connection to attempt dumping out any information, I would like to know whether it could be in certain bad state that can be fixed by reset (if possible), or the switch will no longer be useable.

Below is more details

Without any port cables:

  • Plug in

  • Green LED show green in port#2 and port#6 briefly. Other port LEDs are off

  • System LED and Fan LED turn green

  • Within 1-2 seconds (varies), all LEDs turned off

  • All Fans are still running until unplugged.

A similar pattern is seen when there’re cables connected from ConnectX-2 cards (MHGH19B_XTR and MHQH29B_XTR). Though not always repeatable, at some point I got the switch shortly recognized by discovery before it shuts itself off as seen in [edited] log below

ibnetdisc.c:724; from DR path slid 0; dlid 0; 0

ibnetdisc.c:585; Query Node Info; DR path slid 0; dlid 0; 0

ibnetdisc.c:533; Found new node GUID 0x2c90xxxxx9b08 (DR path slid 0; dlid 0; 0)

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0 (0x2c90xxxxx9b08):1

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0 portnum 1: base lid 5 state 2 physstate 5 4X 5.0 Gbps 5.0 Gbps

ibnetdisc.c:585; Query Node Info; DR path slid 0; dlid 0; 0,1

ibnetdisc.c:533; Found new node GUID 0x2c90xxxxx90b8 (DR path slid 0; dlid 0; 0,1)

ibnetdisc.c:481; linking: 0x2c90xxxxx90b8 0000xxxxxxxADC50->0000xxxxxxxAA660:1 and 0x2c90xxxxx9b08 0000xxxxxxxAD9F0->0000xxxxxxxADB90:1

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0,1 (0x2c90xxxxx90b8):0

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0,1 portnum 0: base lid 0 state 4 physstate 5 4X 10.0 Gbps 10.0 Gbps

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0,1 (0x2c90xxxxx90b8):1

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0,1 (0x2c90xxxxx90b8):2

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0,1 (0x2c90xxxxx90b8):3

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0,1 (0x2c90xxxxx90b8):4

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0,1 (0x2c90xxxxx90b8):5

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0,1 (0x2c90xxxxx90b8):6

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0,1 (0x2c90xxxxx90b8):7

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0,1 (0x2c90xxxxx90b8):8

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0,1 portnum 1: base lid 0 state 2 physstate 5 4X 5.0 Gbps 5.0 Gbps

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0,1 portnum 2: base lid 0 state 1 physstate 2 4X undefined (7) undefined (7)

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0,1 portnum 3: base lid 0 state 1 physstate 2 4X 10.0 Gbps 10.0 Gbps

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0,1 portnum 4: base lid 0 state 1 physstate 2 4X undefined (7) undefined (7)

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0,1 portnum 5: base lid 0 state 1 physstate 2 4X undefined (7) undefined (7)

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0,1 portnum 6: base lid 0 state 1 physstate 2 4X undefined (7) undefined (7)

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0,1 portnum 7: base lid 0 state 2 physstate 5 4X 5.0 Gbps 5.0 Gbps

ibnetdisc.c:585; Query Node Info; DR path slid 0; dlid 0; 0,1,7

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0,1 portnum 8: base lid 0 state 1 physstate 2 4X undefined (7) undefined (7)

ibnetdisc.c:533; Found new node GUID 0x2c90xxxxxde42 (DR path slid 0; dlid 0; 0,1,7)

ibnetdisc.c:481; linking: 0x2c90xxxxxde42 0000xxxxxxxAACB0->0000xxxxxxxADFE0:1 and 0x2c90xxxxx90b8 0000xxxxxxxADC50->0000xxxxxxxAAAE0:7

ibnetdisc.c:434; Query Port Info; DR path slid 0; dlid 0; 0,1,7 (0x2c90xxxxxde42):1

ibnetdisc.c:199; portid DR path slid 0; dlid 0; 0,1,7 portnum 1: base lid 4 state 2 physstate 5 4X 5.0 Gbps 5.0 Gbps

chassis.c:631; fill_mellanox_chassis_record: node_desc:Infiniscale-IV Mellanox Technologies

chassis.c:645; fill_mellanox_chassis_record: Unsupported node description format:Infiniscale-IV Mellanox Technologies

Switch : 0x0002c90xxxxx90b8 ports 8 devid 0xbd36 vendid 0x2c9 “Infiniscale-IV Mellanox Technologies”

Room temperature is about normal 68-72F, and I don’t feel or see anything that suggests the switch may be overheated (other than the internal of it). I tried a couple of different power sockets and that did not make any difference. The manual suggests to get the reason the switch shuts off from management software; with a link down and the switch is externally managed, how can I go about that?

I appreciate if you can give me some advise as where I may go from here.

Thank you very much!

Hi Nang,

while you still have connectivity to the switch,please run the below and provide the output:

flint -d lid- q

Also - can you provide a human readable output of ibnetdiscover without any debug flags?

Hi Nang,

It seems to be like a HW fault, I suggest you somehow try to replace the switch - this is one of the disadvantages of purchasing a used switch

Hello Eddie,

Thank you very much for offering assitance. As soon as I can catch the output of the commands as you instructed, I’ll post. Unfortunately the uptime is shorter and shorter every time now to the point that I can no longer get any communication from the switch via in-band connections (I now have only 2-4 secs before the switch shut off).

I think the ports may be damaged or stuck. I have access to a working switch of the same model and I can see that no port LEDs should be lighted when the switch is powered on in the absence of any data cables. In case of the bad one, without any data cables I see it lit up ports #2 and #6 before everything is shut down (very briefly).

Thanks again.

-Nang