Help to increase DL throughput

Hello,

I am testing cuBB with OAI, DELTA O-RU and Pegatron UE dongle. However I can only reach to 350 Mbps for UDP DL with 100Mhz bandwidth. Do you have any suggestion for me to increase the E2E throughput or to check where the traffic is stuck.

This is my current topology.

BLER is low while transmitting:

2638310.417236 [NR_MAC] I Frame.Slot 0.0
UE RNTI b840 CU-UE-ID 2 in-sync PH 50 dB PCMAX 21 dBm, average RSRP -50 (16 meas)
UE b840: CQI 15, RI 4, PMI (7,1)
UE b840: dlsch_rounds 25896/1485/240/206, dlsch_errors 185, pucch0_DTX 2, BLER 0.08803 MCS (1) 25
UE b840: ulsch_rounds 12284/378/108/0, ulsch_errors 0, ulsch_DTX 188, BLER 0.00000 MCS (1) 27 (Qm 8 dB) NPRB 5 SNR 23.0 dB
UE b840: MAC: TX 1432596933 RX 33770698 bytes
UE b840: LCID 1: TX 3892 RX 26086 bytes
UE b840: LCID 2: TX 0 RX 0 bytes
UE b840: LCID 4: TX 1420603620 RX 31489062 bytes

Thank you very much.

After inspecting eCPRI packet, I noticed that only first 77 PRBs of some slots have values.

Do you know what can be the reason of this?

Thank you.

Hello,
Have you configured the max DL MCS to be 27?
We have found that using the -P option with iperf helps increase throughput: Part 5. Validate the Setup - NVIDIA Docs

You can try enabling 4 DL layers which should double your throughput: targets/PROJECTS/GENERIC-NR-5GC/CONF/gnb-vnf.sa.band78.273prb.aerial.conf · ARC1.6_integration · oai / openairinterface5G · GitLab

It can easier to see which RBs are scheduled in the DL_TTI.Request in the NVIPC messages, but that is done by the L2.
https://docs.nvidia.com/aerial/archive/cuda-accelerated-ran/24-1/aerial_cubb/cubb_quickstart/running_cubb-end-to-end.html#capture-logs

Please let us know if this helps.

Thank you for your comments.

For idea 1: I has change DL max MCS to 27 as show in the following figure.

For idea 2: Yes, I used “-P 10” to run iperf3 with 10 threads.

./src/iperf3 -c 10.0.0.1 --bind-dev enxfa6534c300e5 -P 10 -R -u -b 70M -t 0

For idea 3: I think I have already enabled 4 DL layer.


As you can see in the first image, I added a new log nrOfLayers to show the number of layers requested by scheduler. The code I changed is as the following:

For idea 4:
I will try this and get back to you.

Thank you very much.

I realized that when I use 2 layers, the throughput is the same as when I use 4 layers.


It seems that the FAPI layer request is not executed at HIGH PHY.

Can you check that the L2 cores are separated from the L1 cores?
Typically when I see pTbInput=0x0 it is an indication of core clash.
In docker-compose.yaml you can use: ci-scripts/yaml_files/sa_gnb_aerial/docker-compose.yaml · ARC1.6_integration · oai / openairinterface5G · GitLab
And in the l2 config: targets/PROJECTS/GENERIC-NR-5GC/CONF/gnb-vnf.sa.band78.273prb.aerial.conf · ARC1.6_integration · oai / openairinterface5G · GitLab

You can also disable the h2d copy thread in the cuphycontroller.yaml as you’re only using one cell with:
enable_h2d_copy_thread: 0

I have cores 24-33 dedicated for L1 and L2

Core 33 for L2

Core 24-32 for L1 as in the following configuration:
l2_adapter_config.zip (4.3 KB)

I am not sure about the actual reason, but, currently, pTbInput=0x0 happens for the first few second (10s) and happens occasionally latter.
Screen Recording 2024-11-07 223002.mp4

After a while pTbInput=0x0 does not happen but the DL throughput is still limited at ~360Mbps.

I added enable_h2d_copy_thread: 0 but no different.
I think it could be problem with number of layers.

It seems that only first region is fully utilized.

I found the reason why my channel bandwidth is not fully utilized. It is because I use USB 2.1 port (480Mbps limit) to connect the UE to my PC. UE cannot consume much data, somehow L2 knows this and does not allocate all PRBs to the UE.

I changed to USB3.2 interface, the DL throughput now can reach to ~600Mbps for 100Mhz bandwidth.


However, I do believe I can double the throughput if I can use 4 layers. Could you please help me with this?
I comment out/uncomment this block of configuration but the throughput does not change.

@vantuan_ngo can you capture the FAPI log on L1 <> L2 interface to confirm if 4 layers are actually scheduled?

In your last image you have: CQI 15 RI 2 which says that the Rank Indication is 2, and you can only do 2 DL layers.
Does your UE have 4 antennas?

Thanks @nhedberg, this also answers my question above.

In the last image, I changed the parameter to have 2 layers at most, in order to show that the maximum DL throughput does not change when the number of maximum MIMO layer change.

This is the image captured after changing the maximum MIMO layer to 4.

This is the configuration I have change to get maximum of 4 MIMO layers.

My UE has been tested with 4 MIMO layers with other system.
I will try to test with another UE.

I will also try to capture FAPI.

1 Like

This is the nvipc.pcap file I captured to while my samsung s23 ultra are receiving 600Mbps UDP DL with 4 MIMO layer.
nvipc_good.pcap

I dont have wireshark plugin to parse the FAPI pcap file. Do you have any wireshark binary (which can parse FAPI) that is possible to share with me?

Things look mostly good with few DL CRC errors reported by the UE.
The L2 isn’t scheduling all of the RBs on DL. Sometimes it schedules all 273, but often by the end of the frame only 85, indicating that it didn’t have more data to transmit, so this could be an issue in the CN.
Can you set the different components of the CN to be on specific cores using cpuset in the docker-compose.yaml, and then check CPU usage of the oai-upf using docker stats during traffic?

It looks like in your last log you still have pTbInput=0x0 during the 700Mbps traffic. What server type is this?

Thank you all for your support. I am now able to reach to 1Gbps UDP DL OTA with DELTA RU and Pegatron UE.

The reason is that my default oai-upf container cannot process to output more than 650Mbps GTP traffic to oai nr-softmodem. I checked N3 interface.

My current solution is to using Open5GS.

BTW, for you question, I am using a rack server from QCT which have 2 NVIDIA A40 GPUs and 2 Intel(R) Xeon(R) Gold 5418Y CPUs.

Thank you very much.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.