Hello Nvidia Support Team,
I am following the Aerial CUDA-Accelerated RAN 24-2.1 tutorial to install cuBB and run cuBB End-to-End. However, I don’t have an AX800 on GH200 or an A100X on Dell R750. Instead, my setup consists of two PCs, each with a GPU and a DPU:
cuBB Node:
- GPU: RTX A6000
- CPU: Intel Core i7-13700K (13th Gen)
- Motherboard: ASUS ROG STRIX B660-A GAMING WIFI D4
- DPU: BlueField-2 MBF2H532C-AECOT
RU Emulator Node:
- GPU: GeForce RTX 2080
- CPU: Intel Core i7-12700K (12th Gen)
- Motherboard: ASUS ROG STRIX B660-A GAMING WIFI D4
- DPU: BlueField-2 MBF2H332A-AEEOT
All requirements appear to be met, including kernel version, CUDA, Mellanox firmware, GDRCopy, Docker, NVIDIA container toolkit, NIC firmware, ptp4l
, and phc2sys
. Except that I have to the corresponding NIC firmware (e.g. fw-BlueField-2-rel-24_39_2048-MBF2H532C-AECO_Ax-NVME-20.4.1-UEFI-21.4.13-UEFI-22.4.12-UEFI-14.32.17-FlexBoot-3.7.300.signed.bin) for the DPUs I have.
I successfully ran Use Case 1 (testMAC + SCF L2 Adapter Standalone), and the results matched the tutorial examples.
However, in Use Case 2 (testMAC + cuPHYController_SCF + RU Emulator), I observe the following issue:
- On the cuBB side, there is zero uplink throughput but non-zero downlink throughput.
- On the RU Emulator side, there is zero downlink throughput but non-zero uplink throughput.
I guess that it probably means that the control plane (C-plane) is functional, but the user plane (U-plane) is not.
Questions:
- Are there specific logs I can check to diagnose the issue?
- Could the issue be related to NUMA settings? I always see
NUMA = "-1"
when runningsudo mst status -v
, and I don’t know how to fix it. - My DPUs do not support setting LINK_TYPE_P1. Does this matter?
- Are there any simple applications or sample tests to verify GPU-DPU communication on the same host? Maybe some tools/APIs from DOCA or DPDK?
- Any other debugging suggestions would be greatly appreciated.
Thanks in advance for your help!