Questions about nv-cubb logs before UE attach and eCPRI visibility via tcpdump on Aerial/GPU-NetIO

Hi!

I am currently bringing up an Aerial cuBB + OAI setup on a GH200-based DU server connected to an O-RU over O-RAN fronthaul. I have a few questions regarding the expected behavior before UE attach and packet visibility on the fronthaul interface.

Setup summary

  • Platform: GH200-based DU server

  • Aerial/cuBB container: nv-cubb

  • OAI L2 is running separately

  • Fronthaul Linux interface: aerial00

  • aerial00 PCI address:

$ ethtool -i aerial00
driver: mlx5_core
bus-info: 0002:01:00.0

  • The cuphycontroller YAML is configured to use the same NIC:
src_mac_addr: d8:94:24:57:7f:f2
dst_mac_addr: 8c:1f:64:d1:15:11
nic: '0002:01:00.0'
vlan: 10
pcp: 0

  • The nv-cubb log also confirms that the NIC and VLAN are picked up correctly:
[CTL.YAML]       VLAN ID: 10
[CTL.YAML]       NIC port: 0002:01:00.0
[CTL.SCF] Network interface for PCIe address 0002:01:00.0 : aerial00
EAL: Probe PCI driver: mlx5_pci (15b3:a2dc) device: 0002:01:00.0 (socket 0)

Question 1: Is this nv-cubb log behavior expected before UE attach?

Before any UE is attached, I see the following messages from:

docker logs -f nv-cubb

Example log:

06:40:11.320088 ERR timer_thread 0 [AERIAL_L2ADAPTER_EVENT] [SCF.PHY] send_slot_error_indication: Late slot error encountered for SFN=0 slot=3
06:40:12.320006 CON timer_thread 0 [SCF.PHY] Cell  0 | DL    0.00 Mbps    0 Slots | UL    0.00 Mbps    0 Slots CRC   0 (     0) | Tick 2000
06:40:13.320006 CON timer_thread 0 [SCF.PHY] Cell  0 | DL    0.04 Mbps   50 Slots | UL    0.00 Mbps    0 Slots CRC   0 (     0) | Tick 4000
06:40:14.320007 CON timer_thread 0 [SCF.PHY] Cell  0 | DL    0.04 Mbps   50 Slots | UL    0.00 Mbps    0 Slots CRC   0 (     0) | Tick 6000
06:40:15.320006 CON timer_thread 0 [SCF.PHY] Cell  0 | DL    0.04 Mbps   50 Slots | UL    0.00 Mbps    0 Slots CRC   0 (     0) | Tick 8000
06:40:16.320006 CON timer_thread 0 [SCF.PHY] Cell  0 | DL    0.04 Mbps   50 Slots | UL    0.00 Mbps    0 Slots CRC   0 (     0) | Tick 10000

My questions are:

  1. Is it normal to see a small non-zero DL throughput such as DL 0.04 Mbps before UE attach?

  2. Is it normal for UL throughput to remain 0.00 Mbps before UE attach?

  3. Is the Late slot error encountered for SFN=0 slot=3 message expected during startup, or does it indicate a timing/synchronization problem that needs to be fixed?

Question 2: Should eCPRI packets be visible through tcpdump on the Linux interface?

I also checked the fronthaul interface using tcpdump:

sudo tcpdump -i aerial00 -nn -e -q -c 20 'ether proto 0xaefe or vlan'

I can see VLAN 10 eCPRI frames arriving from the O-RU to the GH200 interface:

8c:1f:64:d1:15:11 > d8:94:24:57:7f:f2, 802.1Q, length 376: vlan 10, p 0, Unknown Ethertype (0xaefe)

So the observed direction is:

O-RU MAC     8c:1f:64:d1:15:11
  -> DU MAC  d8:94:24:57:7f:f2
VLAN 10
PCP 0
EtherType 0xAEFE

However, I was told that if GPU-NetIO / GPUDirect / DOCA-based packet processing is being used, eCPRI packets should bypass the Linux networking stack and therefore should not be visible through normal tcpdump on the Linux netdev.

Could you clarify the expected behavior here?

Specifically:

  1. Is it expected that eCPRI frames may still be visible with tcpdump -i aerial00 even when Aerial/cuBB is using DPDK/GPU-NetIO on the same mlx5 NIC?

  2. Does seeing eCPRI packets in tcpdump imply that packets are not being steered into the intended DPDK/GPU-NetIO path?

  3. What is the recommended way to verify that the received eCPRI U-plane/C-plane packets are actually being consumed by the Aerial/cuBB fast path and delivered to the intended GPU/device-memory buffers?

  4. Should I use dpdk-dumpcap, mlx5/DPDK counters, Aerial internal counters, or some other diagnostic tool to confirm the actual data path?

Any guidance on how to distinguish between “packet is merely visible to Linux tcpdump” and “packet is actually being processed by the Aerial/GPU-NetIO path” would be very helpful.

Thanks.

@jinwoomoon

  1. Is it normal to see a small non-zero DL throughput such as DL 0.04 Mbps before UE attach?

Yes, this is an expected DL broadcasting throughput before UE attach.

  1. Is it normal for UL throughput to remain 0.00 Mbps before UE attach?

Yes, it is normal before any UE attempting to transmit or attach, but small number of UL slots are expected while UE attempting to attach. would you please configure “ul_order_timeout_gpu_log_enable: 1”, anin cuphycontroller_*.yaml, and check the console to see if there are timeout message?

  1. Is the Late slot error encountered for SFN=0 slot=3 message expected during startup, or does it indicate a timing/synchronization problem that needs to be fixed?

this message is OK if it observed at the moment of seeing any DL slots. no need to fix.

Thanks!