Hi all,
I’m running Aerial CUDA RAN cuBB E2E tests and consistently hitting a chain of errors that I haven’t been able to resolve. Sharing logs and config files below — would appreciate any pointers.
System Configuration
- Two GH200 servers
- Server 1: cuPHY Controller + TestMAC
- Server 2: RU Emulator
Issue
09:17:51.552708 ERR UlPhyDriver06 0 [AERIAL_CUPHY_API_EVENT] [DRV.FUNC_UL] SFN 474.15 Slot Map 363 PUSCH Pre Early Harq Wait kernel timeout!
09:17:51.552708 INF UlPhyDriver06 0 [DRV.FUNC_UL] Triggering Early UCI Indication Callback to L2A for Slot Map 363
09:17:51.552743 ERR UlPhyDriver07 0 [AERIAL_CUPHY_API_EVENT] [DRV.FUNC_UL] SFN 474.15 Slot Map 363 PUSCH Post Early Harq Wait kernel timeout!
09:17:51.552743 INF UlPhyDriver06 0 [DRV.FUNC_UL] {TI} <UL Task AGGR3 Early UCI IND,474,15,363,6> <3,0.17,19.21,0.17,0.24,489.98,1.93> Start Task:1777540671549720555,ULC Tasks Complete Wait:1777540671549720907,Check Task Abort:1777540671549721099,PUCCH Wait:1777540671549721227,UCI Det Completion WAIT:1777540671549722763,Early UCI Callback:1777540671552708338,Run PUSCH_RUN_FULL_SLOT_COPY:1777540671552713522,Signal Completion:1777540671552739346,End Task:1777540671552740082,
09:17:51.552748 WRN UlPhyDriver06 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for ULC start_tx_time 1777540671551664000 current_time 1777540671552748114
09:17:51.552748 WRN UlPhyDriver06 0 [DRV.FH] sendCPlane_timingCheck: Error at oframe_id_ 219 osfid_ 2 oslotid_ 0
09:17:51.552748 ERR UlPhyDriver06 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] UL C-plane send error for cell index 0,error type 2 Map 364 Abort UL Tasks!
09:17:51.552748 ERR UlPhyDriver06 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] Calling ul_tx_error_fn 1491
09:17:51.552749 ERR UlPhyDriver06 0 [AERIAL_NVIPC_API_EVENT] [SCF.PHY] Send Err.ind for SFN 475.4 cell_id=0 msg_id=0x81 err_code=0x37
09:17:51.552752 ERR UlPhyDriver06 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] Task ul_aggr_1_cplane 0 aborted the tasklist for an error
09:17:51.552755 INF UlPhyDriver06 0 [DRV.FUNC_UL] {TI} <UL Task CPlane 1,475,4,364,6> <3,82.82,36.07,44.12,49.85,318.27,0.53> Start Task:1777540671552746706,Wait ULBFW Completion:1777540671552747122,CPlane Prepare:1777540671552747378,Cplane error check:1777540671552748658,Signal completion:1777540671552752146,End Task:1777540671552752210,
09:17:51.552759 INF UlPhyDriver06 0 [DRV.FUNC_UL] {TI} <UL Task CPlane 2,475,4,364,6> <3,27.87,16.47,2.53,6.97,282.46,1.26> Start Task:1777540671552756818,Wait ULBFW Completion:1777540671552757138,CPlane Prepare:1777540671552757170,Cplane error check:1777540671552757202,Signal completion:1777540671552757234,End Task:1777540671552757298,
09:17:51.552761 WRN UlPhyDriver06 0 [DRV.FH] sendCPlane_timingCheck : sendCPlane Timing error for ULC start_tx_time 1777540671552164000 current_time 1777540671552761586
09:17:51.552761 WRN UlPhyDriver06 0 [DRV.FH] sendCPlane_timingCheck: Error at oframe_id_ 219 osfid_ 2 oslotid_ 1
09:17:51.552761 ERR UlPhyDriver06 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] UL C-plane send error for cell index 0,error type 2 Map 365 Abort UL Tasks!
09:17:51.552761 ERR UlPhyDriver06 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] Calling ul_tx_error_fn 1491
09:17:51.552762 ERR UlPhyDriver06 0 [AERIAL_NVIPC_API_EVENT] [SCF.PHY] Send Err.ind for SFN 475.5 cell_id=0 msg_id=0x81 err_code=0x37
09:17:51.552764 ERR UlPhyDriver06 0 [AERIAL_CUPHYDRV_API_EVENT] [DRV.FUNC_UL] Task ul_aggr_1_cplane 0 aborted the tasklist for an error
Full logs and config files attached. Any guidance would be greatly appreciated.
Thanks!
configs_and_logs.zip (34.2 MB)