AERIAL ARC With Foxconn RPQN-7800E for N78

Hello, I have a config question for running the aerial SDK with a foxconn RPQN-7800E radio (which does not support the reference frequency from the default Openair branch i have been referred to targets/PROJECTS/GENERIC-NR-5GC/CONF/gnb-vnf.sa.band78.273prb.aerial.conf · 2024.w21+ARC1.5 · oai / openairinterface5G · GitLab)

7801 ref freq? (https://fccid.io/2AQ68RPQN7801)
vs
available radio (https://fccid.io/2AQ68RPQN7800)

I see your latest guide calls out supporting the 7801E/4800E, but the 7800E is the radio we have available, i tried setting a frequency we have used with this radio before, and am getting an error in the DU/ L1 where relating to DL subframe settings, i was wondering if there is a missing setting or way to use a supported frequency with this model of foxconn radio.

 Entering ITTI signals handler
TYPE <CTRL-C> TO TERMINATE
103863077193 [I] 1300211264: aerial_nr_config_resp_cb: [VNF] Received NFAPI_CONFIG_RESP idx:1 phy_id:0
[NFAPI_VNF]   Received CONFIG.response, gNB is ready!
^[[0m103863077212 [D] 1300211264: nfapi_vnf_pnf_list_find: config->pnf_list:0x5621c7df7f80
103863077216 [E] 1300211264: nfapi_vnf_pnf_list_find: nfapi_vnf_pnf_list_find : curr->p5_idx:1 p5_idx:1
^[[1;31m[NFAPI_VNF]
============================================================================
sfn slot doesn't match unpacked one! L2->L1 0.0  vs L1->L2 0.3
============================================================================
^[[0mGNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1

attached logs and configs, along with a fapi pcap
nv-fxn-test.zip (485.3 KB)

Hi @eric.a.momper ,

The issue you are observing seems unrelated to the O-RU. There is a mis-alignment between the timing of the L2+ and L1. L1 is sending the first slot indication with SFN 0.3 but the OAI L2 is expecting SFN 0.0.

Can you please investigate the OAI L2 side to understand why its timing is not aligned with L1?

Thank you.

Hi @eric.a.momper,

This is expected as the L2 starts at 0.0 while the L2 interface in cuPHY-CP/cuphycontroller/config/l2_adapter_config_P5G.yaml configures:

slot_advance: 3

Both will run from that point on without issue, for example:

37222524469 [E] 3707367824: nfapi_vnf_pnf_list_find: nfapi_vnf_pnf_list_find : curr->p5_idx:1 p5_idx:1
[NFAPI_VNF]
============================================================================
sfn slot doesn't match unpacked one! L2->L1 0.0  vs L1->L2 0.3
============================================================================
[NR_MAC]   Frame.Slot 128.0

[NR_MAC]   Frame.Slot 256.0

[NR_MAC]   RACH.indication put_queue successfull
[NR_MAC]   368.19 UE RA-RNTI 010b TC-RNTI 2c1d: Activating RA process index 0
[NR_MAC]   UE 2c1d: 369.6 Generating RA-Msg2 DCI, RA RNTI 0x10b, state 1, CoreSetType 0, RAPID 34
[NR_MAC]   UE 2c1d: Msg3 scheduled at 369.17 (369.6 k2 8 TDA 2)
[NR_MAC]   Adding new UE context with RNTI 0x2c1d

Thanks @nhedberg @bkecicioglu for the explanation, i set the timing_advance to 0 and see the value chagned accordingly, your saying thats just a startup thing that isnt an issue thats fine with me.

I think I am still seeing an issue with the chosen frequency, due to the model of the radio, when i set to the reference center frequency specified in your docs with the appropiate SCS/PointA frequencies

When setting the radio to the REF frequnecy ~3.7 GHZ it errors out

#DU CONFIG
      #reference  ~3.7 GHZ  FREQ
      # rejected by foxconn radio 7800E
      #dl_absoluteFrequencyPointA   = 646724;
      #absoluteFrequencySSB   = 649920;
#FOXCONN CONFIG:
      #RRH_LO_FREQUENCY_KHZ = 3750000
#I get an error from the radio 
./init_rrh_config_enable_cuplane
--------------------------
Verify XRAN parameters are successful
--------------------------
Model name: 7800
upper=3650000KHz lower=3250000KHz
  max=3799140KHz   min=3700860KHz
Setting of LO and PRB is in-valid

When setting the DU/RU to a lower frequency ~3.3 GHZ we have tested with before i get an assert error from the DU where it exits out.

#DU CONFIG
     #tested ~3.3 GHZ  FREQ
     # Foxconn 7800E comes up in state=1, DU errors out
     #absoluteFrequencySSB   = 620736;
     #dl_absoluteFrequencyPointA  = 620208;
#FOXCONN CONFIG:
      #RRH_LO_FREQUENCY_KHZ = 3352260

I get an error from the OAI DU which is an assert, which i realized was missing from my original message/ the logs

nv-cubb  | 13:36:34.786399 WRN timer_thread 0 [L2A.TICK] PTP Configs: gps_alpha: 0 gps_beta: 0
oai-ran  | [NFAPI_VNF]
oai-ran  | ============================================================================
oai-ran  | sfn slot doesn't match unpacked one! L2->L1 0.0  vs L1->L2 0.0
oai-ran  | ============================================================================
oai-ran  | [NR_MAC]   Frame.Slot 0.0
oai-ran  |
oai-ran  |
oai-ran  | Assertion (type0_PDCCH_CSS_config->cset_start_rb >= 0) failed!
oai-ran  | In get_type0_PDCCH_CSS_config_parameters() /oai-ran/openair2/LAYER2/NR_MAC_COMMON/nr_mac_common.c:4251
oai-ran  | Invalid CSET0 start PRB -4 SSB offset point A 12 RB offset 16
nv-cubb  | 13:36:36.920046 WRN timer_thread 0 [SCF.PHY] Cell  0 | DL    0.00 Mbps    0 Slots | UL    0.00 Mbps    0 Slots CRC   0 (     0) | Tick 0
oai-ran  | GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
oai-ran  | Copyright (C) 2022 Free Software Foundation, Inc.
oai-ran  | License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
oai-ran  | This is free software: you are free to change and redistribute it.
oai-ran  | There is NO WARRANTY, to the extent permitted by law.
oai-ran  | Type "show copying" and "show warranty" for details.
oai-ran  | This GDB was configured as "x86_64-linux-gnu".
oai-ran  | Type "show configuration" for configuration details.
oai-ran  | For bug reporting instructions, please see:
oai-ran  | <https://www.gnu.org/software/gdb/bugs/>.
oai-ran  | Find the GDB manual and other documentation resources online at:
oai-ran  |     <http://www.gnu.org/software/gdb/documentation/>.
oai-ran  |
oai-ran  | For help, type "help".
oai-ran  | Type "apropos word" to search for commands related to "word".
oai-ran  | Attaching to process 8
...
oai-ran  | Thread 12 (Thread 0x7f0bd9ffb640 (LWP 148) "nr-softmodem"):
oai-ran  | #0  0x00007f0be412242f in __GI___wait4 (pid=149, stat_loc=stat_loc@entry=0x7f0bd9ff1af8, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
oai-ran  | #1  0x00007f0be41223ab in __GI___waitpid (pid=<optimized out>, stat_loc=stat_loc@entry=0x7f0bd9ff1af8, options=options@entry=0) at ./posix/waitpid.c:38
oai-ran  | #2  0x00007f0be4088bdb in do_system (line=line@entry=0x7f0bd9ff1f00 "gdb -ex='set confirm off' -ex 'thread apply all bt' -ex q -p 9 < /dev/null") at ../sysdeps/posix/system.c:171
oai-ran  | #3  0x00007f0be4088d7e in __libc_system (line=line@entry=0x7f0bd9ff1f00 "gdb -ex='set confirm off' -ex 'thread apply all bt' -ex q -p 9 < /dev/null") at ../sysdeps/posix/system.c:207
oai-ran  | #4  0x000055ecbd299d07 in get_type0_PDCCH_CSS_config_parameters (type0_PDCCH_CSS_config=type0_PDCCH_CSS_config@entry=0x55ecc11f77b8, frameP=frameP@entry=0, mib=mib@entry=0x55ecc11f8b60, num_slot_per_frame=num_slot_per_frame@entry=20 '\024', ssb_subcarrier_offset=ssb_subcarrier_offset@entry=0 '\000', ssb_start_symbol=ssb_start_symbol@entry=2, scs_ssb=1, frequency_range=FR1, nr_band=78, ssb_index=0, ssb_period=2, ssb_offset_point_a=12) at /oai-ran/openair2/LAYER2/NR_MAC_COMMON/nr_mac_common.c:4251
oai-ran  | #5  0x000055ecbd1f2a7f in schedule_nr_mib (module_idP=module_idP@entry=0, frameP=frameP@entry=0, slotP=slotP@entry=0, DL_req=DL_req@entry=0x55ecbe3a9320 <g_sched_resp+32>) at /oai-ran/openair2/LAYER2/NR_MAC_gNB/gNB_scheduler_bch.c:215
oai-ran  | #6  0x000055ecbd1f095b in gNB_dlsch_ulsch_scheduler (module_idP=module_idP@entry=0, frame=0, slot=0, sched_info=sched_info@entry=0x55ecbe3a9300 <g_sched_resp>) at /oai-ran/openair2/LAYER2/NR_MAC_gNB/gNB_scheduler.c:250
oai-ran  | #7  0x000055ecbcf71b10 in trigger_scheduler (slot_ind=slot_ind@entry=0x7f0bd9ffab20) at /oai-ran/nfapi/oai_integration/aerial/fapi_vnf_p7.c:731
oai-ran  | #8  0x000055ecbcf71c5d in aerial_phy_nr_slot_indication (ind=0x7f0bd9ffab20) at /oai-ran/nfapi/oai_integration/aerial/fapi_vnf_p7.c:769
oai-ran  | #9  0x000055ecbcf50fa9 in ipc_handle_rx_msg (msg=0x7f0bd9ffab40, ipc=0x55ecc1231240) at /oai-ran/nfapi/oai_integration/aerial/fapi_nvIPC.c:206
oai-ran  | #10 aerial_recv_msg (recv_msg=0x7f0bd9ffab40, ipc=0x55ecc1231240) at /oai-ran/nfapi/oai_integration/aerial/fapi_nvIPC.c:581
oai-ran  | #11 epoll_recv_task (arg=<optimized out>) at /oai-ran/nfapi/oai_integration/aerial/fapi_nvIPC.c:644
oai-ran  | #12 0x00007f0be40ccac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
oai-ran  | #13 0x00007f0be415e850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
oai-ran  |

If i have the DU set to ~3.7 and the RU Set to ~3.3 the DU comes up and tries to send to the RU when its up but the frames are rejected since the frequncies are mismatched which would be expected.

xRN: total=3800 c_early=0 c_on=0 c_late=0 err_tci=0 err_ecpri=0 err_port=0 err_sct=0 err_total=3800

I can attach logs/configs for any of these test cases if needed.

Hi Eric,
We haven’t integrated with the 7800E yet, so we don’t have configuration files for it yet.
The stated frequency range is 3.3-3.6GHz, so it makes sense that the 3.75GHz center frequency is rejected.

For the 3.3GHz configuration, you likely have:

    pdcch_ConfigSIB1 = (
      { controlResourceSetZero = 12; searchSpaceZero = 0; }
    );

TS 38.213 Table 13-4 row 12 gives that as 16 RBs offset, as in the assertion.

Using:

absoluteFrequencySSB        = 620736 => 3311.040 MHz
dl_absoluteFrequencyPointA  = 620208 => 3303.120 MHz
rbOffset = 16 RBs => 5.76 MHz
SSB Width = 20   = > 7.92 MHz
You have: 
Upper edge coreset: 3303.12 + 5.76  =  3308.88 MHz
Lower SSB :         3311.04 - 7.92/2 = 3307.08 MHz

The two overlap.

Some alternatives are:
620832 => 3312.480 MHz - The lower edge of SSB is exactly at 3308.88 and gives high throughput on an iPhone
623520 => 3352.800 MHz - Center of band as in the 3.75Mhz and CBRS configurations

@nhedberg
Thanks for the info i tried with some other frequencies based on your guidance and seem to still be seeing an issue.

      absoluteFrequencySSB                                          =  624384;
      dl_absoluteFrequencyPointA                                    = 620244;
#centerfreq for GSCN 7753 
RRH_LO_FREQUENCY_KHZ = 3406890

but i am still seeing this err_total incrementing

Attached port mirroring pcap of traffic from DU to radio, but not seeing a response
fh-fxn.zip (14.8 KB)

Is this error because L1 is not seeing uplink packets back from the radio?

16:04:00.490902 WRN msg_processing 0 [CUPHY.MEMFOOT] cuphyMemoryFootprint - GPU allocation: 0.093 MiB for cuPHY CSIRS channel object (0x5db5433ae80).
16:04:00.491063 WRN msg_processing 0 [CUPHY.MEMFOOT] cuphyMemoryFootprint - GPU allocation: 0.093 MiB for cuPHY CSIRS channel object (0x5db5433b180).
16:04:00.491075 WRN msg_processing 0 [DRV.API] Update cell: mplane_id=1 dl_grid_sz=273
16:04:00.491077 WRN msg_processing 0 [DRV.API] Update cell: mplane_id=1 ul_grid_sz=273
16:04:00.502228 WRN timer_thread 0 [L2A.TICK] Thread slot_indication_thread_sleep_method initialized fmtlog
16:04:00.502234 WRN timer_thread 0 [L2A.TICK] PTP Configs: gps_alpha: 0 gps_beta: 0
16:04:04.280030 WRN timer_thread 0 [SCF.PHY] Cell  0 | DL    0.00 Mbps    0 Slots | UL    0.00 Mbps    0 Slots CRC   0 (     0) | Tick 0
16:04:04.299521 INF msg_processing 0 [SCF.SLOTCMD] update_cell_command PRACH occaPrmStatIdx=0 occaPrmDynIdx=0, numRa =0
16:04:04.299922 WRN UlPhyDriver03 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for ULC. Too late to abort UL tasks start_tx_time 1725897844299095000 current_time 1725897844299921455 start_ch_task_time 1725897844299000000
16:04:04.299924 ERR UlPhyDriver03 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] UL C-plane send timing error for cell index 0 error type 2 Map 0,Error detection too late. No UL Task Abort!
16:04:04.299928 ERR UlPhyDriver03 0 [AERIAL_L2ADAPTER_EVENT] [SCF.PHY] Send SLOT error indication from L1 SFN=1 slot=19 for msg_id=0x07
16:04:04.300039 INF msg_processing 0 [SCF.SLOTCMD] DL symbols = 0
16:04:04.300040 DBG msg_processing 0 [SCF.SLOTCMD] update_new_coreset:672 PDCCH testModel=0
16:04:04.300048 DBG msg_processing 0 [SCF.SLOTCMD] update_cell_command: SFN 2.0 cell_id=0 DL_TTI.req: PDU 0-1-0 tb_size=101 pdu_offset=0
16:04:04.300049 DBG msg_processing 0 [SCF.SLOTCMD] update_cell_command:1171 PDSCH testMode=0
16:04:04.300509 WRN msg_processing 0 [L2A.MODULE] Current SFN 2.1, Previous slot received=false process_phy_commands: cell_id=0 channels=0 - invalid PDSCH pTbInput=0x0 data_buf=0x0
16:04:04.300509 WRN msg_processing 0 [L2A.MODULE] Dropping the slot command for slot 0
16:04:04.303811 ERR UlPhyDriver03 0 [AERIAL_CUPHY_API_EVENT] [DRV.FUNC_UL] Slot Map 0, SFN 1 Slot 19 Order kernel timeout error or Exit error for cell index 0 Dyn index -1!
16:04:04.309518 INF msg_processing 0 [SCF.SLOTCMD] update_cell_command PRACH occaPrmStatIdx=0 occaPrmDynIdx=0, numRa =0
16:04:04.309903 WRN UlPhyDriver03 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for ULC. Too late to abort UL tasks start_tx_time 1725897844309095000 current_time 1725897844309903414 start_ch_task_time 1725897844309000000
16:04:04.309903 ERR UlPhyDriver03 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] UL C-plane send timing error for cell index 0 error type 2 Map 1,Error detection too late. No UL Task Abort!
16:04:04.309904 ERR UlPhyDriver03 0 [AERIAL_L2ADAPTER_EVENT] [SCF.PHY] Send SLOT error indication from L1 SFN=2 slot=19 for msg_id=0x07
16:04:04.313629 ERR UlPhyDriver02 0 [AERIAL_CUPHY_API_EVENT] [DRV.FUNC_UL] Slot Map 1, SFN 2 Slot 19 Order kernel timeout error or Exit error for cell index 0 Dyn index -1!
16:04:04.319519 INF msg_processing 0 [SCF.SLOTCMD] update_cell_command PRACH occaPrmStatIdx=0 occaPrmDynIdx=0, numRa =0
16:04:04.319902 WRN UlPhyDriver03 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for ULC. Too late to abort UL tasks start_tx_time 1725897844319095000 current_time 1725897844319902428 start_ch_task_time 1725897844319000000
16:04:04.319902 ERR UlPhyDriver03 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] UL C-plane send timing error for cell index 0 error type 2 Map 2,Error detection too late. No UL Task Abort!
16:04:04.319903 ERR UlPhyDriver03 0 [AERIAL_L2ADAPTER_EVENT] [SCF.PHY] Send SLOT error indication from L1 SFN=3 slot=19 for msg_id=0x07
16:04:04.320026 INF msg_processing 0 [SCF.SLOTCMD] DL symbols = 0
16:04:04.320027 DBG msg_processing 0 [SCF.SLOTCMD] update_new_coreset:672 PDCCH testModel=0
16:04:04.320030 DBG msg_processing 0 [SCF.SLOTCMD] update_cell_command: SFN 4.0 cell_id=0 DL_TTI.req: PDU 0-1-0 tb_size=101 pdu_offset=0
16:04:04.320030 DBG msg_processing 0 [SCF.SLOTCMD] update_cell_command:1171 PDSCH testMode=0
16:04:04.320315 WRN DlPhyDriver04 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for DLC start_tx_time 1725897844319595000 current_time 1725897844320315522
16:04:04.320316 ERR DlPhyDriver04 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_DL] DL C-plane send error for task num 0 error type 1 Map 0

Attached full oai and/ phy.log

oai-gnb.log (61.9 KB)
phy.log (2.3 MB)

Hi @eric.a.momper,

> Is this error because L1 is not seeing uplink packets back from the radio?

The following error seen from the start indicates an issue with sending c-plane messages from the DU. The timing of the c-plane was too late.

16:27:27.179521 INF msg_processing 0 [SCF.SLOTCMD] update_cell_command PRACH occaPrmStatIdx=0 occaPrmDynIdx=0, numRa =0
16:27:27.179932 WRN UlPhyDriver02 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for ULC. Too late to abort UL tasks start_tx_time 1725899247179095000 current_time 1725899247179931622 start_ch_task_time 1725899247179000000
16:27:27.179935 ERR UlPhyDriver02 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] UL C-plane send timing error for cell index 0 error type 2 Map 0,Error detection too late. No UL Task Abort!
16:27:27.179939 ERR UlPhyDriver02 0 [AERIAL_L2ADAPTER_EVENT] [SCF.PHY] Send SLOT error indication from L1 SFN=1 slot=19 for msg_id=0x07

There are also dropped messages by the Aerial L2 adapter, because a FAPI message with the previous SFN was received.

16:27:27.180508 WRN msg_processing 0 [L2A.MODULE] Current SFN 2.1, Previous slot received=false process_phy_commands: cell_id=0 channels=0 - invalid PDSCH pTbInput=0x0 data_buf=0x0
16:27:27.180509 WRN msg_processing 0 [L2A.MODULE] Dropping the slot command for slot 0

Can you please check your ptp4l and phc2sys settings?

Thank you.

Can you run the following command on the host?

$ pip3 install psutil
$ cd $cuBB_SDK/cuPHY/util/cuBB_system_checks
$ sudo -E python3 ./cuBB_system_checks.py

Thank you

Thanks for the info,

Went to change/simplify the Fronthaul wiring this morning, as we realized ptp was not working after we tried changing some things yesterday, current wiring for ptp shown in the report is directly through the firbolan https://www.fibrolan.com/Falcon-MX-G.
we were previously using a cumulus N3700V Supermicro SSE-SN3700-VS2RC 100G/200G Ethernet Switch Offers 32x QSFP56 ports Regular Airflow (Front to Back) | Wiredzone but wanted to avoid using that to simplify wiring, we were trying different topologies to figure out link / transciever compatibity between the CX7 Nic and the Foxconn 10G SFP.

The simplified current configuration is the DU/CX7 FH (4x10G fanout) and Foxconn/SFP FH wired directly to the firbolan Switch SFP ports , the Foxconn using the 1G rj45 to the fibrolan for PTP, which seems to latch/stay synced ok

I this state we actually got a different behavior where the foxconn incremented the err_ecipri/ err_tci bit

Tx Att = 14400 15600 14700 15600
TX1 dpdErrorCode 0x340c
TX2 dpdErrorCode 0x340c
TX3 dpdErrorCode 0x340c
TX4 dpdErrorCode 0x340c
Temperature of ad9025 is 67 degree Celsius.
trace_log_idx_g: 0
trace_log_g 0x10
ptp: state=3 rms=3                max=10               freq=-221     delay=526
10R: sec=464 hps=464 64b=315 65to128=6 total=343 uni=1 uni>1158=1 multi=340 crc_err=0
10T: sec=464 hps=464 64b=7 65to128=25 total=79 uni=0 uni>1158=0 multi=28 crc_err=0 state=1 start=0 atick=0 iatick=928001 adj=-70 rstcnt=0
xRN: total=1 c_early=0 c_on=0 c_late=0 err_tci=1 err_ecpri=1 err_port=0 err_sct=0 err_total=1
Latch later 1pps time=5c11c0b4 swi4010=5c11c0b4 xran_sec=5c11c0b2 acc_diff[3]=-70 hps_sec=464 cur_sec=464 tx_through=0 rx_through=0
trace_log_idx_g: 0
trace_log_g 0x10

Port mirror pcap of single packet we see when starting the cubb.

cubb sys report attached
cubb-sys.log (13.0 KB)

affinity output while app is running:

2024-09-10 12:27:31,996 WARNING:Found name: phy_main cores: {5, 6, 7, 8, 9, 10, 11, 12, 17, 20}
2024-09-10 12:27:31,996 WARNING:Found name: bash cores: {13}
2024-09-10 12:27:31,996 WARNING:Found name: nr-softmodem cores: {13, 14, 15, 16, 17, 18, 19}
2024-09-10 12:27:31,996 WARNING:Found name: tee cores: {13}
2024-09-10 12:27:31,996 WARNING:Found name: nr-softmodem cores: {16}
2024-09-10 12:27:31,996 WARNING:Found name: runc:[2:INIT] cores: {19}
2024-09-10 12:27:32,223 WARNING:Found name: rcu_par_gp cores: {15}
2024-09-10 12:27:32,224 WARNING:Found name: inet_frag_wq cores: {3}
2024-09-10 12:27:32,224 WARNING:Found name: oom_reaper cores: {2}
2024-09-10 12:27:32,224 WARNING:Found name: writeback cores: {3}
2024-09-10 12:27:32,224 WARNING:Found name: kcompactd0 cores: {3}
2024-09-10 12:27:32,224 WARNING:Found name: ksmd cores: {2}
2024-09-10 12:27:32,224 WARNING:Found name: khugepaged cores: {4}
2024-09-10 12:27:32,224 WARNING:Found name: kintegrityd cores: {3}
2024-09-10 12:27:32,224 WARNING:Found name: blkcg_punt_bio cores: {4}
2024-09-10 12:27:32,225 WARNING:Found name: xprtiod cores: {3}
2024-09-10 12:27:32,225 WARNING:Found name: ptp4l cores: {21}

One thing we wanted to check is what the expected mtu is based on your setup, was seeing the foxconn kernel interface was 4000 bytes, but assuming the fpga/ru_app can recieve the packets directly at the full mtu, the DU mtu is set at 8192.

Hi @eric.a.momper ,

Can you please elaborate on what you are seeing after correcting the PTP service? What is the new issue you are observing? Can you share the L1 log (phy.log)? Do you see any issues on the L1 side or the IQ samples are sent from the L1 correctly but not transmitted by the O-RU?

It is correct that the MTU size should be 8192 for the Foxconn radio.

Thank you.

I am still seeing the CPlane Sending check error, the pcap still only shows the uplane packets downlink to the foxconn, and the foxconn still is incermenting the error bits

ptp: state=3 rms=1                max=1                freq=-222     delay=527
10R: sec=21444 hps=21444 64b=10831 65to128=35 total=16386 uni=4794 uni>1158=4794 multi=11586 crc_err=0
10T: sec=21444 hps=21444 64b=7 65to128=34 total=1500 uni=1 uni>1158=0 multi=37 crc_err=0 state=1 start=20492 atick=0 iatick=42888001 adj=-139 rstcnt=0
xRN: total=4104 c_early=0 c_on=0 c_late=0 err_tci=0 err_ecpri=0 err_port=0 err_sct=0 err_total=4104
Latch later 1pps time=9472f605 swi4010=9472f605 xran_sec=9472f603 acc_diff[8]=-3142 hps_sec=21444 cur_sec=21444 tx_through=0 rx_through=0
16:15:25.259923 WRN UlPhyDriver06 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for ULC. Too late to abort UL tasks start_tx_time 1725984925259095000 current_time 1725984925259922980 start_ch_task_time 1725984925259000000
16:15:25.259925 ERR UlPhyDriver06 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] UL C-plane send timing error for cell index 0 error type 2 Map 0,Error detection too late. No UL Task Abort!
16:15:25.259929 ERR UlPhyDriver06 0 [AERIAL_L2ADAPTER_EVENT] [SCF.PHY] Send SLOT error indication from L1 SFN=1 slot=19 for msg_id=0x07
16:15:25.260028 INF msg_processing 0 [SCF.SLOTCMD] DL symbols = 0
16:15:25.260029 DBG msg_processing 0 [SCF.SLOTCMD] update_new_coreset:672 PDCCH testModel=0
16:15:25.260039 DBG msg_processing 0 [SCF.SLOTCMD] update_cell_command: SFN 2.0 cell_id=0 DL_TTI.req: PDU 0-1-0 tb_size=101 pdu_offset=0
16:15:25.260040 DBG msg_processing 0 [SCF.SLOTCMD] update_cell_command:1171 PDSCH testMode=0
16:15:25.260505 WRN msg_processing 0 [L2A.MODULE] Current SFN 2.1, Previous slot received=false process_phy_commands: cell_id=0 channels=0 - invalid PDSCH pTbInput=0x0 data_buf=0x0
16:15:25.260506 WRN msg_processing 0 [L2A.MODULE] Dropping the slot command for slot 0
16:15:25.263812 ERR UlPhyDriver06 0 [AERIAL_CUPHY_API_EVENT] [DRV.FUNC_UL] Slot Map 0, SFN 1 Slot 19 Order kernel timeout error or Exit error for cell index 0 Dyn index -1!
16:15:25.269520 INF msg_processing 0 [SCF.SLOTCMD] update_cell_command PRACH occaPrmStatIdx=0 occaPrmDynIdx=0, numRa =0
16:15:25.269907 WRN UlPhyDriver05 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for ULC. Too late to abort UL tasks start_tx_tim

phy.log (3.7 MB)

it appears that the mtu is not settable that large from the kernel shell, do you see this as well in your foxconn?
I know we have used this radio with other tests at the larger mtu, unless there is different config/ expected FW?

The firmware version we are running is
v3.1.5q.524

As a sanity check, i re-tested the ru-sim mode of the cubb with the current direct wiring/ mtu’s thought the firbolan switch, i tried with the default MTU for that test (1514) and with 8192, the test seemed to run fine pcap below, the l1/ test mac configs are pretty close to the defaults, will check to compare.

Hi @eric.a.momper ,

The issues you are seeing are not related to the O-RU.

16:15:25.259923 WRN UlPhyDriver06 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for ULC. Too late to abort UL tasks start_tx_time 1725984925259095000 current_time 1725984925259922980 start_ch_task_time 1725984925259000000
16:15:25.259925 ERR UlPhyDriver06 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] UL C-plane send timing error for cell index 0 error type 2 Map 0,Error detection too late. No UL Task Abort!
16:15:25.259929 ERR UlPhyDriver06 0 [AERIAL_L2ADAPTER_EVENT] [SCF.PHY] Send SLOT error indication from L1 SFN=1 slot=19 for msg_id=0x07
16:15:25.260028 INF msg_processing 0 [SCF.SLOTCMD] DL symbols = 0
16:15:25.260029 DBG msg_processing 0 [SCF.SLOTCMD] update_new_coreset:672 PDCCH testModel=0
16:15:25.260039 DBG msg_processing 0 [SCF.SLOTCMD] update_cell_command: SFN 2.0 cell_id=0 DL_TTI.req: PDU 0-1-0 tb_size=101 pdu_offset=0
16:15:25.260040 DBG msg_processing 0 [SCF.SLOTCMD] update_cell_command:1171 PDSCH testMode=0
16:15:25.260505 WRN msg_processing 0 [L2A.MODULE] Current SFN 2.1, Previous slot received=false process_phy_commands: cell_id=0 channels=0 - invalid PDSCH pTbInput=0x0 data_buf=0x0
16:15:25.260506 WRN msg_processing 0 [L2A.MODULE] Dropping the slot command for slot 0
16:15:25.263812 ERR UlPhyDriver06 0 [AERIAL_CUPHY_API_EVENT] [DRV.FUNC_UL] Slot Map 0, SFN 1 Slot 19 Order kernel timeout error or Exit error for cell index 0 Dyn index -1!
16:15:25.269520 INF msg_processing 0 [SCF.SLOTCMD] update_cell_command PRACH occaPrmStatIdx=0 occaPrmDynIdx=0, numRa =0
16:15:25.269907 WRN UlPhyDriver05 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for ULC. Too late to abort UL tasks start_tx_tim

Message processing and/or timer threads are interrupted by some other processes. Please make sure the cores are isolated. It seems hyperthreading is enabled in your system. In this case, the paired cores assigned to Aerial cores should not be used by other processes as well.

16:27:01.061461 WRN phy_init 0 [CTL.YAML] DPDK core: 8
16:27:01.061461 WRN phy_init 0 [CTL.YAML] Prometheus core: -1
16:27:01.061461 WRN phy_init 0 [CTL.YAML] UL cores: 
16:27:01.061461 WRN phy_init 0 [CTL.YAML] 	- 2
16:27:01.061462 WRN phy_init 0 [CTL.YAML] 	- 3
16:27:01.061462 WRN phy_init 0 [CTL.YAML] DL cores: 
16:27:01.061462 WRN phy_init 0 [CTL.YAML] 	- 4
16:27:01.061462 WRN phy_init 0 [CTL.YAML] 	- 5
16:27:01.061462 WRN phy_init 0 [CTL.YAML] 	- 6
-----Kernel Command Line--------------------------
Audit subsystem                    : audit=0
Clock source                       : clocksource=tsc
HugePage count                     : hugepages=16
HugePage size                      : hugepagesz=1G
CPU idle time management           : idle=poll
Max Intel C-state                  : intel_idle.max_cstate=0
Intel IOMMU                        : intel_iommu=on
IOMMU                              : iommu=pt
**Isolated CPUs                      : isolcpus=5-107**
Corrected errors                   : mce=ignore_ce
Adaptive-tick CPUs                 : nohz_full=5-107
Soft-lockup detector disable       : nosoftlockup
Max processor C-state              : processor.max_cstate=0
RCU callback polling               : rcu_nocb_poll
No-RCU-callback CPUs               : rcu_nocbs=5-107
TSC stability checks               : tsc=reliable

Hi Eric,
We have example configuration with Fibrolan here: Part 2. Configure the Network Hardware - NVIDIA Docs

In particular:

In this setup, the Qulsar GrandMaster is connected to port 4, the Aerial cuBB to port 17, and the Foxconn O-RU to port 16 (C/U plane) and port 15 (S/M plane). You can ignore all other ports in the figures[A][B] below.

Your image shows VLAN PRI: 7, ID: 2 which suggests that your cuphycontroller yaml has:

      vlan: 2
      pcp: 7 

That 7 should be a zero to match the RU configuration, so the RU is dropping it.
https://docs.nvidia.com/aerial/aerial-ran-colab-ota/current/text/installation_guide/configure_hardware.html#update-o-ru-configuration

<!-- RRH_C_PLANE_VLAN_TAG: C-plane V-LAN tag express by hex number -->
RRH_C_PLANE_VLAN_TAG = 0x0002
<!-- RRH_U_PLANE_VLAN_TAG: U-plane V-LAN tag express by hex number -->
RRH_U_PLANE_VLAN_TAG = 0x0002

The RU is an intel Arria 10 SoC, and in the boot logs you should see this setting the fixed PS MTU:

[    8.122848] intel_fpga_qse_ll ff224000.mac_10g_0 qse-eth: max rx packet size 1518
..
[    8.154222] intel_fpga_qse_ll ff224000.mac_10g_0 qse-eth: tx max frame: 0x000005ee

The PL side of the chip handles eCPRI packets, so you don’t get them on the processor side.

The MTU in the DU is set in cuphycontroller yaml, e.g:

  nics:
    - nic: '0000:b5:00.0'
      mtu: 8192

Hi @bkecicioglu thanks for the info, ill try to check/simplify my cpu pinning configuraiton (turning off hyperthreading), from your system check script and me checking the pinning manually i don’t think i notice any conflicts will attach cubb and oai-du configs below

root@5g-test-gpu:~/openairinterface5g/ci-scripts/yaml_files/sa_gnb_aerial# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.15.0-1042-nvidia-lowlatency root=/dev/mapper/ubuntu--vg-ubuntu--lv ro default_hugepagesz=1G 
hugepagesz=1G hugepages=16 tsc=reliable clocksource=tsc intel_idle.max_cstate=0 
mce=ignore_ce processor.max_cstate=0 intel_pstate=disable audit=0 idle=poll 
isolcpus=5-107 nohz_full=5-107 rcu_nocbs=5-107 rcu_nocb_poll nosoftlockup iommu=pt intel_iommu=on irqaffinity=0-4,108-112
2024-09-10 12:27:31,996 WARNING:Found name: phy_main cores: {5, 6, 7, 8, 9, 10, 11, 12, 17, 20}
2024-09-10 12:27:31,996 WARNING:Found name: bash cores: {13}
2024-09-10 12:27:31,996 WARNING:Found name: nr-softmodem cores: {13, 14, 15, 16, 17, 18, 19}
2024-09-10 12:27:31,996 WARNING:Found name: tee cores: {13}
2024-09-10 12:27:31,996 WARNING:Found name: nr-softmodem cores: {16}
2024-09-10 12:27:31,996 WARNING:Found name: runc:[2:INIT] cores: {19}
2024-09-10 12:27:32,223 WARNING:Found name: rcu_par_gp cores: {15}
2024-09-10 12:27:32,224 WARNING:Found name: inet_frag_wq cores: {3}
2024-09-10 12:27:32,224 WARNING:Found name: oom_reaper cores: {2}
2024-09-10 12:27:32,224 WARNING:Found name: writeback cores: {3}
2024-09-10 12:27:32,224 WARNING:Found name: kcompactd0 cores: {3}
2024-09-10 12:27:32,224 WARNING:Found name: ksmd cores: {2}
2024-09-10 12:27:32,224 WARNING:Found name: khugepaged cores: {4}
2024-09-10 12:27:32,224 WARNING:Found name: kintegrityd cores: {3}
2024-09-10 12:27:32,224 WARNING:Found name: blkcg_punt_bio cores: {4}
2024-09-10 12:27:32,225 WARNING:Found name: xprtiod cores: {3}
2024-09-10 12:27:32,225 WARNING:Found name: ptp4l cores: {21}

core-pin.tar.gz (6.0 KB)


From checking the top in the system i did notice the loadavg keeps climbing while the cuphy / oai-du are running, so maybe there is some mispinning of a generic OS task or the hyperthread system, hopefully me reducing / changing the config can resovle that

Hi, @bkecicioglu

Is this L1 core pinning config a reference one you have run with the OAI/Foxconn test?

I checked for docs for the cmdline / bios settings and simplified the setup with hyperthreading disabled and am not seeing the system loadavg going high (much better now) however I am still seeing the same behavior (CP message timing late).

Wondering if its a specs cpu issue with the server im running on 2.0ghz Xeon IceLake dual socket in a supermicro 2U server, which seems pretty similar to the R750 server spec wise. (see info below)

Thank you @nhedberg thats what I was suspecting from other tests we used that radio in the past also used jumbo frames, so ethernet wise i think that makes sense.

trace_log_idx_g: 0
trace_log_g 0x10
ptp: state=3 rms=2                max=9                freq=-218     delay=527
10R: sec=18600 hps=18600 64b=9523 65to128=146 total=19907 uni=9600 uni>1158=8760 multi=10279 crc_err=0
10T: sec=18600 hps=18600 64b=7 65to128=31 total=1321 uni=1 uni>1158=0 multi=36 crc_err=0 state=1 start=13698 atick=0 iatick=37200001 adj=-714 rstcnt=0
xRN: total=984 c_early=0 c_on=0 c_late=0 err_tci=0 err_ecpri=0 err_port=0 err_sct=0 err_total=984
Latch later 1pps time=382fa213 swi4010=382fa213 xran_sec=382fa211 acc_diff[1]=-2716 hps_sec=18600 cur_sec=18600 tx_through=0 rx_through=0
trace_log_idx_g: 0
trace_log_g 0x10

I also tried setting the pcp vlan prio back to zero, but am still seeing a similar behavior,

This is also the setting i have in the radio.

<!-- RRH_C_PLANE_VLAN_TAG: C-plane V-LAN tag express by hex number -->
RRH_C_PLANE_VLAN_TAG = 0x0002
<!-- RRH_U_PLANE_VLAN_TAG: U-plane V-LAN tag express by hex number -->
RRH_U_PLANE_VLAN_TAG = 0x0002

Zip of some analysis i did of some of the logs F08 RUSIM (works) vs FXN (C-Plane error)

F08-VS-FXN.zip (2.1 MB)

@eric.a.momper

I had copied the following from the phy.log you shared from your test.

16:27:01.061461 WRN phy_init 0 [CTL.YAML] DPDK core: 8
16:27:01.061461 WRN phy_init 0 [CTL.YAML] Prometheus core: -1
16:27:01.061461 WRN phy_init 0 [CTL.YAML] UL cores: 
16:27:01.061461 WRN phy_init 0 [CTL.YAML] 	- 2
16:27:01.061462 WRN phy_init 0 [CTL.YAML] 	- 3
16:27:01.061462 WRN phy_init 0 [CTL.YAML] DL cores: 
16:27:01.061462 WRN phy_init 0 [CTL.YAML] 	- 4
16:27:01.061462 WRN phy_init 0 [CTL.YAML] 	- 5
16:27:01.061462 WRN phy_init 0 [CTL.YAML] 	- 6

If this is the setting you are using, the phy worker threads are running on non-isolated cores (I understand cores 2-4 are also used by the OS) .