ERR UlPhyDriver03 0 [AERIAL_CUPHY_API_EVENT] [DRV.FUNC_UL]

Hi,
We are integrating cuBB with OAI, and our configuration is as shown in the attached file.
When starting nv-cubb and oai-gnb-aerial using docker-compose.yaml, we encountered the following issue.
Could you please let us know what might be causing this?
Thank you.

server: MGX server(GH200)
cuBB version: 24-1
nic: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01)
O-RU: Foxconn

Server PTP status


O-RU PTP status

cuphycontroller_P5G_FXN.yaml.txt (4.7 KB)
nv-cubb_error_log.txt (44.8 KB)
cuBB_system_checks.txt (3.9 KB)

This is the error message we encountered.

I had this problem and rebooting the RU fixed it.

Hi gantedomenico

Thank you for your response.
However, even after rebooting the RU, We are still experiencing the same issue.

Hi @jerryb_chen

Would you please share the output of the command “cat /proc/cmdline”

Also, highly recommend you move to ARC1.5 which is using newer L2+ stack and Aerial24-3 (or you can switch to Aerial25-1). Please refer to the document hereAerial RAN CoLab Over-the-Air — Aerial RAN CoLab Over-the-Air

Hi @jixu

Thanks for your response.
cat /proc/cmdline output is as follows:

BOOT_IMAGE=/vmlinuz-6.2.0-1012-nvidia-64k root=/dev/mapper/ubuntu–vg-ubuntu–lv ro =realloc=off pci=pcie_bus_safe default_hugepagesz=512M hugepagesz=512M hugepages=32 tsc=reliable processor.max_cstatdit=0 idle=poll rcu_nocb_poll nosoftlockup irqaffinity=0 isolcpus=managed_irq,domain,4-47 nohz_full=4-47 rcu_nocbs=4-47 earlycon module_blacklist=nouveau acpi_power_meter.force_cap_on=y numa_balancingle init_on_alloc=0 preempt=none

We modified the gnb-vnf.sa.band78.273prb.aerial.conf configuration file by setting nrofDownlinkSymbols to 0, and we were able to successfully start up the gNB. The nv_cubb and oai_gnb_aerial logs are as follows.

gnb-vnf.sa.band78.273prb.aerial.conf.txt (9.7 KB)
nv_cubb.txt (28.6 KB)
oai_gnb_aerial.txt (18.6 KB)

However, when we tried to register using the UE, the QXDM log shows that the UE received the Mib, but no further messages were received afterward. What could be the possible reason for this?

We would like to check the PCAP logs, but we couldn’t find any content in /var/log/aerial/nvipc_pcap inside the nv_cubb container. Where else can we check for the PCAP logs?

Hi @jixu

In addition, we tried testing with the Aerial 24-3 version as you suggested, but we encountered an issue when building cuphycontroller using the following command. Could you help us understand what the problem might be?

export cuBB_SDK=$(pwd)
mkdir build && cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=cuPHY/cmake/toolchains/native
make -j $(nproc --all)

cubb_24_3_make_error_log.txt (74.2 KB)

@jerryb_chen

  1. ‘order kernel time out error’ is most likely related with cpu affinity.
  • From the outputs of ‘cat /proc/cmdline/’ , the isolated core is 4-64. In the cuphycontroller_P5G_FXN.yaml, workers_ul: [2,3] are non-isolated cores. Please change workers_ul: [2,3] to two other cores such as [9,10].
  1. while using GH for gNB, use the configuration yaml files below for L1
  • cuphycontroller_P5G_FXN_GH.yaml
  • l2_adapter_config_P5G_GH.yaml
  1. regarding not seeing pcap in /var/log/aerial/, was is cuphycontroller (L1) has been stopped? if you still don’t see nvipc_pcap even after L1 has been stopped, please run the commands below and recheck in the container
    sudo ./build/cuPHY-CP/gt_common_libs/nvIPC/tests/pcap/pcap_collect nvipc /tmp
    sudo mv /tmp/nvipc*.pcap /var/log/aerial/

Thanks

Hi @jixu

Thanks for your response.
After we upgraded the Aerial SDK to version 24-3, we were able to successfully perform the end-to-end (E2E) tests.
Thank you for your suggestion.