I’m testing my InfiniBand network consisting of a SB7800 EDR switch and a couple of nodes with ConnectX-6 HCA.
In particular, I just want to see if congestion control via FECN and BECN works as I expected.
By creating an incast situation to a target node, I could see FECN marking by the switch works correctly.
“perfquery --rcvcc” at the target node shows me that PortPktRcvFECN has increased after the test.
Also, packet sniffing using ibdump indicates some of the packets have a FECN mark.
But, I couldn’t see any packet with BECN which is supposed to be generated by the target node.
And, “perfquery --rcvcc” at the source nodes also indicates that no packet with BECN has been received.
I tried many different settings, but it didn’t solve the problem.
(Turning off AR, changing link speed, etc.)
Could you tell me what I did wrong?
Thanks in advance.
What are the firmware version you are using on switch and HCAs? What is the OFED version?
Check that you have “mlnx_congestion_control 2” set in opensm.conf configuration.
What is the test you are running? ib_write_bw/ib_read_bw? try both.
It works on my setup:
PortRcvConCtrl counters: Lid 3 port 1
PortRcvConCtrl counters: Lid 2 port 1
PortRcvConCtrl counters: Lid 5 port 1
Not all FECN will lead to BECN, for example:
It also can be that a target node is sending CN packet back . Check if BECN bit is set in BETH
For additional details, please check IB specification, Vol 1.4. “A10.2.2 CA BEHAVIOR”.
If you still seeing the behaviour different then described in IB specification, please open a support case with provide all the details of the test, topology and log files, as your organization has a valid support contract.
It works!! I didn’t know CC setting should be configured in opensm configuration. Thank you very much!!