@bkecicioglu Thank you for your reply.
Iâll give you some outputs, including the command you gave me.
Except for sudo -E python3 ./cuBB_system_checks.py, these commands were entered from outside of cuBB.
cat /etc/ptp.conf
[global]
dataset_comparison G.8275.x
G.8275.defaultDS.localPriority 128
maxStepsRemoved 255
logAnnounceInterval -3
logSyncInterval -4
logMinDelayReqInterval -4
G.8275.portDS.localPriority 128
network_transport L2
domainNumber 24
tx_timestamp_timeout 30
slaveOnly 1
clock_servo pi
step_threshold 1.0
egressLatency 28
pi_proportional_const 4.65
pi_integral_const 0.1
[aerial00]
announceReceiptTimeout 3
delay_mechanism E2E
network_transport L2
sudo lshw -c network -businfo
Bus info Device Class Description
=========================================================
pci@0000:01:00.0 eno8303 network NetXtreme BCM5720 Gigabit Ethernet PCIe
pci@0000:01:00.1 eno8403 network NetXtreme BCM5720 Gigabit Ethernet PCIe
pci@0000:0f:00.0 aerial00 network MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
pci@0000:0f:00.1 aerial01 network MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
pci@0000:22:00.0 eno12399np0 network BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet
pci@0000:22:00.1 eno12409np1 network BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet
pci@0000:22:00.2 eno12419np2 network BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet
pci@0000:22:00.3 eno12429np3 network BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet
sudo nvidia-smi topo --matrix
GPU0 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PXB PXB 0,2,4,6,8,10 0 N/A
NIC0 PXB X PIX
NIC1 PXB PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
sudo ibdev2netdev -v
# I've seen the following note.
# Aerial has been using Mellanox inbox driver instead of MOFED since the 23-4 release. MOFED must be removed if it is installed on the system.
sudo: ibdev2netdev: command not found
sudo -E python3 ./cuBB_system_checks.py â in cuBB
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
-----Mellanox NICs--------------------------------
-----Mellanox NIC Interfaces----------------------
-----Linux PTP------------------------------------
-----Software Packages----------------------------
cmake /usr/local/bin : 3.25.1
docker : N/A
gcc /usr/bin : 11.4.0
git-lfs /usr/bin : 3.0.2
MOFED : N/A
meson /usr/bin : 0.61.2
ninja /usr/bin : 1.10.2
ptp4l : N/A
-----Loaded Kernel Modules------------------------
GDRCopy : gdrdrv
GPUDirect RDMA : N/A
Nvidia : nvidia
-----Non-persistent settings----------------------
VM swappiness : vm.swappiness = 60
VM zone reclaim mode : vm.zone_reclaim_mode = 0
-----Docker images--------------------------------
aerial@NEWHOSTNAME:/opt/nvidia/cuBB/cuPHY/util/cuBB_system_checks$ sudo lshw -c network -businfo
sudo: lshw: command not found
aerial@NEWHOSTNAME:/opt/nvidia/cuBB/cuPHY/util/cuBB_system_checks$ sudo nvidia-smi topo --matrix
GPU0 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PXB PXB 0,2,4,6,8,10 0 N/A
NIC0 PXB X PIX
NIC1 PXB PIX X
Itâs also frustrating that the result of sudo -E python3 ./cuBB_system_checks.p is different from the guide.