I am using HPE ProLiant DL380 Gen10 with Intel Xeon Gold 6248 CPU.
GPU and NIC information is as follows.
yogurt@134servwe:~$ lspci | grep -i nvidia
37:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
86:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
yogurt@134servwe:~$ lspci | grep -i mellanox
12:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
12:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
d8:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
d8:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
d8:00.2 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface (rev 01)
How do I have to update BF2 BFB image and NIC Firmware.
Can I follow the instructions in “1.4.2.18 Update A100X BFB Image and NIC Firmware” of “Aerial CUDA-Accelerated RAN 24-2.1” version?
When I follow instructions in “1.4.2.18 Update A100X BFB Image and NIC Firmware” of “Aerial CUDA-Accelerated RAN 24-2.1”, The following error occurred during NIC firmware update.
I found the firmware suitable for MT_0000000738 and did the update.
-E- PSID mismatch. The PSID on flash (MT_0000000738) differs from the PSID in the given image (NVD0000000015).
jixu
December 13, 2024, 2:54am
3
@twoheons
yes, you can follow that instruction for updating the NIC FW.
did you download the image for PSID for MT_0000000738? please run the follow two command and shar the outputs of the sudo flint -d mlx5_0 q
$ sudo mst start
$ sudo flint -d mlx5_0 q
Thanks
I received the image below from the site below.
the log is here.
yogurt@134servwe:~$ sudo mst start
[sudo] password for yogurt:
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
[warn] mst_pciconf is already loaded, skipping
Create devices
Unloading MST PCI module (unused) - Success
yogurt@134servwe:~$
yogurt@134servwe:~$ sudo flint -d mlx5_0 q
Image type: FS4
FW Version: 22.29.1016
FW Release Date: 31.12.2020
Product Version: 22.29.1016
Rom Info: type=UEFI version=14.22.14 cpu=AMD64,AARCH64
type=PXE version=3.6.204 cpu=AMD64
Description: UID GuidsNumber
Base GUID: 043f720300ecf862 4
Base MAC: 043f72ecf862 4
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000000606
Security Attributes: N/A
I added logs.
Could you check if there is an issue with NIC FW on the server?
yogurt@134servwe:~$ sudo mst status -v
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
PCI devices:
------------
DEVICE_TYPE MST PCI RDMA NET NUMA
BlueField2(rev:1) /dev/mst/mt41686_pciconf0.1 d8:00.1 mlx5_3 net-aerial01 1
BlueField2(rev:1) /dev/mst/mt41686_pciconf0 d8:00.0 mlx5_2 net-aerial00 1
ConnectX6DX(rev:0) /dev/mst/mt4125_pciconf0.1 12:00.1 mlx5_1 net-ens3f1np1 0
ConnectX6DX(rev:0) /dev/mst/mt4125_pciconf0 12:00.0 mlx5_0 net-ens3f0np0 0
yogurt@134servwe:~$ sudo flint -d mlx5_2 q
Image type: FS4
FW Version: 24.39.3560
FW Release Date: 24.6.2024
Product Version: 24.39.3560
Rom Info: type=UEFI Virtio net version=21.4.13 cpu=AMD64,AARCH64
type=UEFI Virtio blk version=22.4.12 cpu=AMD64,AARCH64
type=UEFI version=14.32.17 cpu=AMD64,AARCH64
type=PXE version=3.7.300 cpu=AMD64
Description: UID GuidsNumber
Base GUID: 1070fd03004a5e8c 16
Base MAC: 1070fd4a5e8c 16
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000000738
Security Attributes: secure-fw
yogurt@134servwe:~$ sudo flint -d mlx5_0 q
Image type: FS4
FW Version: 22.29.1016
FW Release Date: 31.12.2020
Product Version: 22.29.1016
Rom Info: type=UEFI version=14.22.14 cpu=AMD64,AARCH64
type=PXE version=3.6.204 cpu=AMD64
Description: UID GuidsNumber
Base GUID: 043f720300ecf862 4
Base MAC: 043f72ecf862 4
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000000606
Security Attributes: N/A
jixu
January 7, 2025, 8:32am
6