Greetings all,
I REALLY need your help!
I have inherited a couple of Mellanox QM8790 HDR externally managed Infiniband switches which I need to re-deploy.
a. It will initially require a factory reset of the switches - I guess the front right reddish button to be pressed more then 15 seconds.
b. Then it needs to have the “split profile” enabled for each switch - how can it be done?
c. Then I will need to configure ports 1 to 4. and ports 37 to 40. to be “un-split”, and the other ports to be setup as “split” - how can this be done?
Please, provide any instructions on how to accomplish this.
I have looked at QM87xx documentation and it DOES NOT refer to the externally managed switches - QM8790 - but mainly to the managed ones - QM8700.
I have also requested a trial license for UFM, but it does not seem to be helpful at all.
Thank you for reading this message.
a. It will initially require a factory reset of the switches - I guess the front right reddish button to be pressed more then 15 seconds.
Ans. There is no configuration done on this switch. Therefore, there is no need to do a factory reset.
b. Then it needs to have the “split profile” enabled for each switch - how can it be done?
Ans. See the link below.
https://docs.nvidia.com/networking/display/mftv4221406lts/using+mlxconfig#src-2477565152_Usingmlxconfig-splitcable
c. Then I will need to configure ports 1 to 4. and ports 37 to 40. to be “un-split”, and the other ports to be setup as “split” - how can this be done?
Ans. See b.
@marlon1 Thank you so much for your answers!
Now I have a starting path to follow.
One extra question: is UFM capable of performing the same configuration changes as “mlxconfig”? and If it is capable, do you have some pointers on how to proceed to perform the “split-profile” task, or the “split-port” task?
Thank you for your quick attention to this matter!
@marlon1
Sorry to be a pest:
I could not figure out how to query the IB network to find / identify the proper un-managed IB switch (QM8790)
I have run ibnetdiscover
and found the switch:
vendid=0x2c9
devid=0xd2f0
sysimgguid=0x1c34da030049717c
switchguid=0x1c34da030049717c(1c34da030049717c)
Switch 81 "S-1c34da030049717c" # "MF0;infiniband0:MQM8700/U1" enhanced port 0 lid 1 lmc 0
vendid=0x2c9
devid=0xd2f0
sysimgguid=0xb8cef60300d2ac5e
switchguid=0xb8cef60300d2ac5e(b8cef60300d2ac5e)
**Switch 81 "S-b8cef60300d2ac5e" # "Quantum Mellanox Technologies" base port 0 lid 88 lmc 0**
Running ibswitches
also finds the QM8790 unmanaged switch:
[root@ib-sys ~]# ibswitches
Switch : 0x1c34da030049717c ports 81 "MF0;infiniband0:MQM8700/U1" enhanced port 0 lid 1 lmc 0
**Switch : 0xb8cef60300d2ac5e ports 81 "Quantum Mellanox Technologies" base port 0 lid 88 lmc 0**
[root@ib-sys ~]#
However, running **mlxconfig q**
shows only locally installed IB card:
[root@ib-sys ~]# mlxconfig q
Device #1:
----------
Device type: ConnectX6
Name: MCX653106A-ECA_Ax
Description: ConnectX-6 VPI adapter card; H100Gb/s (HDR100; EDR IB and 100GbE); dual-port QSFP56; PCIe3.0 x16; tall bracket; ROHS R6
Device: /dev/mst/mt4123_pciconf0
To change the configuration of an unmanaged IB switch QM8790 it seems that I need to provide the **<device>**
option, but looking in **/dev/mst**
it only shows the local IB card as only available device:
[root@ib-sys ~]# ls -al /dev/mst/
total 0
drwxr-xr-x 2 root root 80 Jun 20 10:55 .
drwxr-xr-x 23 root root 4540 Jun 20 10:55 ..
crw------- 1 root root 510, 0 Jun 20 10:55 mt4123_pciconf0
crw------- 1 root root 510, 1 Jun 20 10:55 mt4123_pciconf0.1
[root@ib-sys ~]#
What am I missing???
Also, how one can reboot (remotely) an unmanaged IB switch QM8790, and will the configuration (i.e. split ports) be reset after the reboot?
Thank you for your time in addressing these questions.
.
The unmanaged switch by default has the name “Quantum Mellanox Technologies”. In this case it is lid 88.
For the mlxconfig command you need to add the switch lid as the device.
mlxconfig -d lid-88 q.
See below for more details:
https://docs.nvidia.com/networking/display/mftv4221406lts/using+mlxconfig#src-2477565152_Usingmlxconfig-splitcable
To find the mst device you need to run the below from the connected IB server.
mst start
mst ib add
mst status
You will see all of the device. However, it is ok to just use the switch lid as the device.
To reboot the switch:
flint -d swreset
For switch lid 88:
flint -d lid-88 swreset
The configuration will keep until it is changed.
Marlon
@marlon1
Super helpful your notes!!!
# mst ib add
and
# mst status
allow me to see all nodes on the subnet !!!
However, when trying to query the QM8790 IB switch (lid-88
) I am getting an error:
[root@ib-sys ~]# mlxconfig -d lid-88 query
**Segmentation fault** (core dumped)
[root@nvidia-ufm ~]# mlxconfig -d /dev/mst/SW_MT54000_Quantum_lid-0x0058 query
**Segmentation fault** (core dumped)
[root@ib-sys ~]#
Any ideas ?
Thank you.
It maybe related to the mst version.
Try to upgrade to the latest version.
@marlon1
The MFT update worked!
I am running:
[root@ibsys~]#mlxconfig -d /dev/mst/SW_MT54000_Quantum_lid-0x0058 -e q
Device #1:
----------
Device type: Quantum
Name: N/A
Description: N/A
Device: /dev/mst/SW_MT54000_Quantum_lid-0x0058
Configurations: Default Current Next Boot
RO SPLIT_NUM_OF_PORTS 40 40 40
RO SPLIT_CAP SPLIT_2X(1) SPLIT_2X(1) SPLIT_2X(1)
* SPLIT_MODE NO_SPLIT_SUPPORT(0) NO_SPLIT_SUPPORT(0) SPLIT_2X(1)
DISABLE_AUTO_SPLIT ENABLE_AUTO_SPLIT(0) ENABLE_AUTO_SPLIT(0) ENABLE_AUTO_SPLIT(0)
PERST_GPIO_CONFIG DISABLE_PERST_GPIO(0) DISABLE_PERST_GPIO(0) DISABLE_PERST_GPIO(0)
SPLIT_PORT Array[1..64] Array[1..64] Array[1..64]
* GB_VECTOR_LENGTH 0 N/A 0
* GB_UPDATE_MODE ALL(0) N/A ALL(0)
* GB_VECTOR Array[0..7] N/A Array[0..7]
The '*' shows parameters with next value different from default/current value.
The 'RO' shows parameters which are for read only and cannot be changed
-E- Failed to query some of the TLVs:
Failed to query the current values of the following TLVs:
nv_gb_conf
Is there a way to display the configuration per port? As a list of ports and each with its configuration?
Also, can one rename the IB switch with some more indicative name, i.e. “IB Switch Rack15”?
Thank you again for your help.!
.
@marlon1
Sorry to be a pest… :(
Regarding the mlxconfig utility, is there a detailed manual with all the available options?
(especially for the query (q) option…)
For QM8790 IB Switch, I did find the latest firmware at:
and I plan to have it updated.
Reiterating the need to change the QM8790 switch description, I did find this resource
pointing to a Python script named “Unmanaged_Switches_Set_NodeDescription_3.4.py” which was made available by NVidia/Mellanox support.
Is it possible to get that script myself as well?
Is there any other tool/option available to change the IB switch QM8790 description?
Thank you for your time.
.
There’s a tool called “ibswinfo” available via github: https://github.com/stanford-rc/ibswinfo
This can set the name of unmanaged quantum switches, and it can also query temperatures and vitals of different components of the switch. Useful tool.
You can also set names via the node name map file of the subnet manager. I think the default location is /etc/opensm/ib-node-name-map. This is not seen on any other hosts though, as it is a mapping done locally.