Even though I didn’t change any configs on my server, the BF3 inferfaces for aerial00 and aerial01 just disappear as below after I rebooted the server,
(It seems the two interfaces of BF3 have been wiped out.)
No BF3 interfaces for aerial00, aerial01 on the ip link list
**2. terminal log**
root@SKT-6GTB-ARS:~# sysinfo-snapshot.py
/usr/sbin/sysinfo-snapshot.py:21: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
from distutils.version import LooseVersion
Sysinfo-snapshot is still in process...please wait till completed successfully
Gathering the information may take a while, especially in large networks
Your patience is appreciated
+------------------------------------------------------------
Running sysinfo-snapshot has ended successfully!
Temporary destination directory is /tmp/
Out file name is /tmp/sysinfo-snapshot-v3.7.7-SKT-6GTB-ARS-20241119-214015.tgz
/tmp/sysinfo-snapshot-v3.7.7-SKT-6GTB-ARS-20241119-214015.tgz:
SKT-6GTB-ARS-20241119-214015.html
amber_info
cables
commands_txt_output
devlink
dmesg
dmidecode
ecn
err_messages:
dummy_functions - contains all not found commands
dummy_paths - contains all not existing internal files (/paths)
dummy_external_paths - contains all not existing external files (/paths)
etc_udev_rulesd - contains all files under /etc/udev/rules.d
ethtool_S - contains all files which are generated from invoking ethtool -S <interface>
firmware - contains all firmware files (mst dump files and commands outputs)
journal
lib_udev_rulesd - contains all files under /lib/udev/rules.d
lshw
pcie_files
performance_tuning_analyze.html
pkglist
show_irq_affinity_all
sr_iov.html
trace
var_log_dmesg
var_log_syslog
According to the dmesg log, the driver somehow failed to load for 0000:01:00.0 and 0000:01:00.1.
[Tue Nov 19 16:02:14 2024] mlx5_core: probe of 0000:01:00.0 failed with error -16
[Tue Nov 19 16:04:14 2024] mlx5_core: probe of 0000:01:00.1 failed with error -16
I’m still checking the provided logs, but could you please try a cold reboot (i.e., full AC cycle) or ipmi chassis power cycle commend (you need to install ipmitool) while I’m doing so? I’ve seen this kind of driver-loading error in the past, and performing a cold reboot resolved the issues.