Nvgpu: 17000000.gpu nvgpu_channel_recover_from_wdt:112 [ERR] Job on channel 508 timed out

Hi NV,
We are using Jetpack 6.0 with the Orin AGX 32G/64G module, and we encountered a GPU issue during operation. The GPU printed error logs, accompanied by CPU error log output, and then the system froze. The more detailed logs have been sent as attachments. Please help to clarify this issue.
Thank you.

LOG_gpu_err_log_and_cpu_err_log_and_reboot.txt (1.9 MB)

[2025-06-11 23:40:04] [21214.614165] nvgpu: 17000000.gpu nvgpu_set_err_notifier_locked:143 [ERR] error notifier set to 8 for ch 506 owned by gnome-shell
[2025-06-11 23:40:04] [21214.719982] nvgpu: 17000000.gpu nvgpu_channel_recover_from_wdt:112 [ERR] Job on channel 508 timed out
[2025-06-11 23:40:04] [21214.720834] ga10b NV_PGRAPH_STATUS: 0x0
[2025-06-11 23:40:04] [21214.720839] ga10b NV_PGRAPH_STATUS1: 0x0
[2025-06-11 23:40:04] [21214.720841] ga10b NV_PGRAPH_ENGINE_STATUS: 0x0
[2025-06-11 23:40:04] [21214.720842] ga10b NV_PGRAPH_GRFIFO_STATUS : 0x1
[2025-06-11 23:40:04] [21214.720844] ga10b NV_PGRAPH_GRFIFO_CONTROL : 0x10001
[2025-06-11 23:40:04] [21214.720846] ga10b NV_PGRAPH_PRI_FECS_HOST_INT_STATUS : 0x0
[2025-06-11 23:40:04] [21214.720848] ga10b NV_PGRAPH_EXCEPTION : 0x0
[2025-06-11 23:40:04] [21214.720849] ga10b NV_PGRAPH_FECS_INTR : 0x0
[2025-06-11 23:40:04] [21214.720854] ga10b NV_PFIFO_ENGINE_STATUS(GR) : 0x10001
[2025-06-11 23:40:04] [21214.720855] ga10b NV_PGRAPH_ACTIVITY0: 0x0
[2025-06-11 23:40:04] [21214.720857] ga10b NV_PGRAPH_ACTIVITY1: 0x0
[2025-06-11 23:40:04] [21214.720859] ga10b NV_PGRAPH_ACTIVITY4: 0x0
[2025-06-11 23:40:04] [21214.720861] ga10b NV_PGRAPH_PRI_SKED_ACTIVITY: 0x0
[2025-06-11 23:40:04] [21214.720863] ga10b NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY0: 0x0
[2025-06-11 23:40:04] [21214.720865] ga10b NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY1: 0x0
[2025-06-11 23:40:04] [21214.720867] ga10b NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY2: 0x0
[2025-06-11 23:40:04] [21214.720868] ga10b NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY3: 0x0
[2025-06-11 23:40:04] [21214.720870] ga10b NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY4: 0x0
[2025-06-11 23:40:04] [21214.720875] ga10b NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_ACTIVITY0: 0x0
[2025-06-11 23:40:04] [21214.720877] ga10b NV_PGRAPH_PRI_GPC0_TPC1_TPCCS_TPC_ACTIVITY0: 0x0
[2025-06-11 23:40:04] [21214.720880] ga10b NV_PGRAPH_PRI_GPC0_TPC2_TPCCS_TPC_ACTIVITY0: 0x0
[2025-06-11 23:40:04] [21214.720881] ga10b NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY0: 0x0
[2025-06-11 23:40:04] [21214.720883] ga10b NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY1: 0x0
[2025-06-11 23:40:04] [21214.720885] ga10b NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY2: 0x0
[2025-06-11 23:40:04] [21214.720887] ga10b NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY3: 0x0
[2025-06-11 23:40:04] [21214.720889] ga10b NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY4: 0x0
[2025-06-11 23:40:04] [21214.720891] ga10b NV_PGRAPH_PRI_GPCS_TPCS_TPCCS_TPC_ACTIVITY0: 0x0
[2025-06-11 23:40:04] [21214.720894] ga10b NV_PGRAPH_PRI_DS_MPIPE_STATUS: 0x0
[2025-06-11 23:40:04] [21214.720895] ga10b NV_PGRAPH_PRI_FE_GO_IDLE_TIMEOUT : 0x7fffffff
[2025-06-11 23:40:04] [21214.720897] ga10b NV_PGRAPH_PRI_FE_GO_IDLE_INFO : 0x1000700
[2025-06-11 23:40:04] [21214.720899] ga10b NV_PGRAPH_PRI_GPC0_TPC0_TEX_M_TEX_SUBUNITS_STATUS: 0x0
[2025-06-11 23:40:04] [21214.720901] ga10b NV_PGRAPH_PRI_CWD_FS: 0x702
[2025-06-11 23:40:04] [21214.720903] ga10b NV_PGRAPH_PRI_FE_TPC_FS(0): 0xf7
[2025-06-11 23:40:04] [21214.720905] ga10b NV_PGRAPH_PRI_CWD_GPC_TPC_ID: 0x1120213
[2025-06-11 23:40:04] [21214.720907] ga10b NV_PGRAPH_PRI_CWD_SM_ID(0): 0x10306
[2025-06-11 23:40:04] [21214.720909] ga10b NV_PGRAPH_PRI_FECS_CTXSW_STATUS_FE_0: 0x0
[2025-06-11 23:40:04] [21214.720911] ga10b NV_PGRAPH_PRI_FECS_CTXSW_STATUS_1: 0x190
[2025-06-11 23:40:04] [21214.720913] ga10b NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_STATUS_GPC_0: 0x0
[2025-06-11 23:40:04] [21214.720915] ga10b NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_STATUS_1: 0x390
[2025-06-11 23:40:04] [21214.720917] ga10b NV_PGRAPH_PRI_FECS_CTXSW_IDLESTATE : 0xf
[2025-06-11 23:40:04] [21214.720919] ga10b NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_IDLESTATE : 0xf
[2025-06-11 23:40:04] [21214.720922] ga10b NV_PGRAPH_PRI_FECS_CURRENT_CTX : 0x301c726c
[2025-06-11 23:40:04] [21214.720924] ga10b NV_PGRAPH_PRI_FECS_NEW_CTX : 0x301c726c
[2025-06-11 23:40:04] [21214.720925] ga10b NV_PGRAPH_PRI_FECS_HOST_INT_ENABLE : 0x7f0003
[2025-06-11 23:40:04] [21214.720927] ga10b NV_PGRAPH_PRI_FECS_HOST_INT_STATUS : 0x0
[2025-06-11 23:40:04] [21214.720929] ga10b NV_PGRAPH_PRI_GPCS_ROP0_CROP_STATUS1 : 0x700000
[2025-06-11 23:40:04] [21214.720931] ga10b NV_PGRAPH_PRI_GPCS_ROPS_CROP_STATUS1 : 0x700000
[2025-06-11 23:40:04] [21214.720933] ga10b NV_PGRAPH_PRI_GPCS_ROP0_ZROP_STATUS : 0x0
[2025-06-11 23:40:04] [21214.720935] ga10b NV_PGRAPH_PRI_GPCS_ROP0_ZROP_STATUS2 : 0x0
[2025-06-11 23:40:04] [21214.720937] ga10b NV_PGRAPH_PRI_GPCS_ROP1_ZROP_STATUS: 0x0
[2025-06-11 23:40:04] [21214.720939] ga10b NV_PGRAPH_PRI_GPCS_ROP1_ZROP_STATUS2: 0x0
[2025-06-11 23:40:04] [21214.720941] ga10b NV_PGRAPH_PRI_GPCS_ROPS_ZROP_STATUS : 0x0
[2025-06-11 23:40:04] [21214.720943] ga10b NV_PGRAPH_PRI_GPCS_ROPS_ZROP_STATUS2 : 0x0
[2025-06-11 23:40:04] [21214.720945] ga10b NV_PGRAPH_PRI_GPC0_GPCCS_GPC_EXCEPTION: 0x0
[2025-06-11 23:40:04] [21214.720947] ga10b NV_PGRAPH_PRI_GPC0_GPCCS_GPC_EXCEPTION_EN: 0xfc0f6004
[2025-06-11 23:40:04] [21214.720949] ga10b NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_EXCEPTION: 0x0
[2025-06-11 23:40:04] [21214.720951] ga10b NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_EXCEPTION_EN: 0x16
[2025-06-11 23:40:04] [21214.720953] ga10b NV_PGRAPH_PRI_GPCS_TPCS_SMS_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.720955] ga10b NV_PGRAPH_PRI_GPCS_TPCS_SMS_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.720957] ga10b NV_PGRAPH_PRI_GPCS_TPCS_SMS_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.720959] ga10b NV_PGRAPH_PRI_GPCS_TPCS_SMS_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.720961] ga10b NV_PGRAPH_PRI_GPCS_TPCS_SMS_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.720963] ga10b NV_PGRAPH_PRI_GPCS_TPCS_SMS_DBGR_BPT_PAUSE_MASK_0: 0x0
[2025-06-11 23:40:04] [21214.720965] ga10b NV_PGRAPH_PRI_GPCS_TPCS_SMS_DBGR_BPT_PAUSE_MASK_1: 0x0
[2025-06-11 23:40:04] [21214.720969] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM0_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.720972] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM0_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.720974] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM0_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.720976] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM0_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.720978] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM0_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.720980] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM0_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.720983] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM1_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.720985] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM1_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.720988] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM1_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.720990] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM1_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.720992] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM1_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.720994] ga10b NV_PGRAPH_PRI_GPC0_TPC0_SM1_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.720996] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM0_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.720999] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM0_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721001] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM0_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721003] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM0_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721005] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM0_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721008] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM0_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721010] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM1_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721012] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM1_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721014] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM1_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721016] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM1_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721019] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM1_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721021] ga10b NV_PGRAPH_PRI_GPC0_TPC1_SM1_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721023] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM0_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721025] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM0_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721027] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM0_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721029] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM0_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721032] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM0_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721034] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM0_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721036] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM1_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721038] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM1_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721040] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM1_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721042] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM1_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721044] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM1_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721047] ga10b NV_PGRAPH_PRI_GPC0_TPC2_SM1_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721049] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM0_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721051] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM0_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721053] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM0_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721055] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM0_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721058] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM0_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721060] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM0_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721062] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM1_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721064] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM1_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721066] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM1_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721068] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM1_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721070] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM1_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721072] ga10b NV_PGRAPH_PRI_GPC1_TPC0_SM1_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721075] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM0_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721077] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM0_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721079] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM0_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721081] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM0_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721083] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM0_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721085] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM0_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721088] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM1_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721090] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM1_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721092] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM1_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721094] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM1_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721096] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM1_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721098] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM1_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721101] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM0_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721103] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM0_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721105] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM0_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721107] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM0_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721109] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM0_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721111] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM0_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721113] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM1_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721116] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM1_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721118] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM1_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721120] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM1_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721122] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM1_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721124] ga10b NV_PGRAPH_PRI_GPC1_TPC2_SM1_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721126] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM0_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721128] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM0_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:04] [21214.721131] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM0_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:04] [21214.721133] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM0_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:04] [21214.721135] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM0_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:04] [21214.721137] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM0_DBGR_STATUS0: 0x0
[2025-06-11 23:40:04] [21214.721139] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM1_HWW_WARP_ESR: 0x0
[2025-06-11 23:40:04] [21214.721141] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM1_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 23:40:05] [21214.721143] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM1_HWW_GLOBAL_ESR: 0x0
[2025-06-11 23:40:05] [21214.721145] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM1_HWW_GLOBAL_ESR_REPORT_MASK: 0x1174
[2025-06-11 23:40:05] [21214.721148] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM1_DBGR_CONTROL0: 0x1c00
[2025-06-11 23:40:05] [21214.721150] ga10b NV_PGRAPH_PRI_GPC1_TPC3_SM1_DBGR_STATUS0: 0x0

CPU error:

[2025-06-11 23:40:09] [21219.631742] **************************************
[2025-06-11 23:40:09] [21219.631743] CPU:0, Error:cbb-fabric, Errmon:64
[2025-06-11 23:40:09] [21219.631747] Error Code: TIMEOUT_ERR
[2025-06-11 23:40:09] [21219.631748] Overflow: Multiple TIMEOUT_ERR
[2025-06-11 23:40:09] [21219.631753]
[2025-06-11 23:40:09] [21219.631754] Error Code: TIMEOUT_ERR
[2025-06-11 23:40:09] [21219.631754] MASTER_ID: TSECA_HEAVYSECURE
[2025-06-11 23:40:09] [21219.631755] Address: 0x1380c460
[2025-06-11 23:40:09] [21219.631755] Cache: 0x3 – Bufferable Modifiable
[2025-06-11 23:40:09] [21219.631756] Protection: 0x2 – Unprivileged, Non-Secure, Data Access
[2025-06-11 23:40:09] [21219.631757] Access_Type: Read
[2025-06-11 23:40:09] [21219.631758] Access_ID: 0x0
[2025-06-11 23:40:09] [21219.631758] Fabric: cbb-fabric
[2025-06-11 23:40:09] [21219.631759] Slave_Id: 0x37
[2025-06-11 23:40:09] [21219.631759] Burst_length: 0x0
[2025-06-11 23:40:09] [21219.631760] Burst_type: 0x1
[2025-06-11 23:40:09] [21219.631760] Beat_size: 0x2
[2025-06-11 23:40:09] [21219.631761] VQC: 0x0
[2025-06-11 23:40:09] [21219.631761] GRPSEC: 0x7b
[2025-06-11 23:40:09] [21219.631762] FALCONSEC: 0x2
[2025-06-11 23:40:09] [21219.631763] AXI2APB_5_BLOCK_TMO_STATUS : 0x1
[2025-06-11 23:40:09] [21219.631764] AXI2APB_5_BLOCK0_TMO : 0x8
[2025-06-11 23:40:09] [21219.631766] **************************************
[2025-06-11 23:40:09] [21219.699201] CPU:0, Error: cbb-fabric@0x13a00000, irq=192
[2025-06-11 23:40:09] [21219.699214] **************************************
[2025-06-11 23:40:09] [21219.699217] CPU:0, Error:cbb-fabric, Errmon:64
[2025-06-11 23:40:09] [21219.699224] Error Code: TIMEOUT_ERR
[2025-06-11 23:40:09] [21219.699227] Overflow: Multiple TIMEOUT_ERR
[2025-06-11 23:40:09] [21219.699235]
[2025-06-11 23:40:09] [21219.699237] Error Code: TIMEOUT_ERR
[2025-06-11 23:40:09] [21219.699240] MASTER_ID: TSECA_NONSECURE
[2025-06-11 23:40:09] [21219.699243] Address: 0x155c0134
[2025-06-11 23:40:09] [21219.699246] Cache: 0x3 – Bufferable Modifiable
[2025-06-11 23:40:09] [21219.699250] Protection: 0x2 – Unprivileged, Non-Secure, Data Access
[2025-06-11 23:40:09] [21219.699255] Access_Type: Read
[2025-06-11 23:40:09] [21219.699257] Access_ID: 0x0
[2025-06-11 23:40:09] [21219.699260] Fabric: cbb-fabric
[2025-06-11 23:40:09] [21219.699262] Slave_Id: 0x3
[2025-06-11 23:40:09] [21219.699264] Burst_length: 0x0
[2025-06-11 23:40:09] [21219.699266] Burst_type: 0x1
[2025-06-11 23:40:09] [21219.699269] Beat_size: 0x2
[2025-06-11 23:40:09] [21219.699271] VQC: 0x0
[2025-06-11 23:40:09] [21219.699273] GRPSEC: 0x7f
[2025-06-11 23:40:09] [21219.699275] FALCONSEC: 0x0
[2025-06-11 23:40:09] [21219.699280] HOST1X_SLV_TIMEOUT_STATUS : 0x1
[2025-06-11 23:40:09] [21219.699284] **************************************
[2025-06-11 23:40:09] [21220.121331] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121348] **************************************
[2025-06-11 23:40:09] [21220.121351] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121364] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121372] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121376] **************************************
[2025-06-11 23:40:09] [21220.121377] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121383] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121391] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121394] **************************************
[2025-06-11 23:40:09] [21220.121395] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121401] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121408] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121411] **************************************
[2025-06-11 23:40:09] [21220.121413] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121419] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121425] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121428] **************************************
[2025-06-11 23:40:09] [21220.121429] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121435] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121442] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121444] **************************************
[2025-06-11 23:40:09] [21220.121446] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121452] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121458] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121461] **************************************
[2025-06-11 23:40:09] [21220.121462] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121471] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121477] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121480] **************************************
[2025-06-11 23:40:09] [21220.121481] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121487] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121493] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121496] **************************************
[2025-06-11 23:40:09] [21220.121497] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121503] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121509] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121512] **************************************
[2025-06-11 23:40:09] [21220.121513] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121519] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121525] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121528] **************************************
[2025-06-11 23:40:09] [21220.121529] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:09] [21220.121534] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:09] [21220.121541] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:09] [21220.121543] **************************************
[2025-06-11 23:40:10] [21220.121545] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:10] [21220.121550] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:10] [21220.121556] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:10] [21220.121559] **************************************
[2025-06-11 23:40:10] [21220.121560] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:10] [21220.121566] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:10] [21220.121575] CPU:0, Error: cbb-fabric@0x13a00000, irq=192
[2025-06-11 23:40:10] [21220.121578] **************************************
[2025-06-11 23:40:10] [21220.121579] CPU:0, Error:cbb-fabric, Errmon:2
[2025-06-11 23:40:10] [21220.121586] Error Code: FIREWALL_ERR
[2025-06-11 23:40:10] [21220.121588] Overflow: Multiple FIREWALL_ERR
[2025-06-11 23:40:10] [21220.121597]
[2025-06-11 23:40:10] [21220.121599] Error Code: FIREWALL_ERR
[2025-06-11 23:40:10] [21220.121602] MASTER_ID: CCPLEX
[2025-06-11 23:40:10] [21220.121604] Address: 0xde12208
[2025-06-11 23:40:10] [21220.121606] Cache: 0x1 – Bufferable
[2025-06-11 23:40:10] [21220.121610] Protection: 0x2 – Unprivileged, Non-Secure, Data Access
[2025-06-11 23:40:10] [21220.121614] Access_Type: Read
[2025-06-11 23:40:10] [21220.121616] Access_ID: 0x10
[2025-06-11 23:40:10] [21220.121618] Fabric: cbb-fabric
[2025-06-11 23:40:10] [21220.121620] Slave_Id: 0x0
[2025-06-11 23:40:10] [21220.121622] Burst_length: 0x0
[2025-06-11 23:40:10] [21220.121624] Burst_type: 0x1
[2025-06-11 23:40:10] [21220.121626] Beat_size: 0x2
[2025-06-11 23:40:10] [21220.121627] VQC: 0x0
[2025-06-11 23:40:10] [21220.121629] GRPSEC: 0x7e
[2025-06-11 23:40:10] [21220.121631] FALCONSEC: 0x0
[2025-06-11 23:40:10] [21220.121633] Slave: AON
[2025-06-11 23:40:10] [21220.121635] **************************************
[2025-06-11 23:40:10] [21220.121664] WARNING: CPU: 0 PID: 100 at drivers/soc/tegra/cbb/tegra234-cbb.c:608 tegra234_cbb_isr+0x134/0x180
[2025-06-11 23:40:10] [21220.122109] —[ end trace 0000000000000002 ]—
[2025-06-11 23:40:10] [21222.011920] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:33] [21222.011930] **************************************
[2025-06-11 23:40:33] [21222.011931] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:33] [21222.011943] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:33] [21222.011951] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:33] [21222.011954] **************************************
[2025-06-11 23:40:33] [21222.011955] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:33] [21222.011961] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:33] [21222.011967] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:33] [21222.011970] **************************************
[2025-06-11 23:40:33] [21222.011971] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:33] [21222.011977] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:33] [21222.011983] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:33] [21222.011985] **************************************
[2025-06-11 23:40:33] [21222.011986] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:33] [21222.011990] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:33] [21222.011995] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:33] [21222.011996] **************************************
[2025-06-11 23:40:33] [21222.011996] CPU:0, Error:sce-fabric, Errmon:2
[2025-06-11 23:40:33] [21222.012000] CBB registers returning all 1’s which is invalid
[2025-06-11 23:40:33] [21222.012005] CPU:0, Error: sce-fabric@0xde00000, irq=188
[2025-06-11 23:40:33] [21222.012006] **************************************

Is this issue able to reproduce on NV devkit?

Since it occurs probabilistically, we have attempted to reproduce the issue on the devkit, but so far, we have not been able to replicate this phenomenon.

We also raised another topic, which seems to be related to this one as well. Are these two issues of the same category?
Nvgpu: 17000000.gpu ga10b_pbdma_handle_intr_0_acquire:646 [ERR] semaphore acquire timeout! - Jetson & Embedded Systems / Jetson AGX Orin - NVIDIA Developer Forums

Any updates on this issue?

We have now reproduced this issue:

When we connect an HDMI monitor and perform a 4K random read operation on a USB external hard drive in the Ubuntu terminal, GPU prints appear in the serial port. After an initial investigation, it seems that the issue is consistently associated with this particular drive.

However, when we perform the same fio read operation on the problematic drive using the serial port or SSH terminal, there are no errors. The GPU error only occurs 100% of the time when the same command is executed in the graphical terminal.

We have tested this problematic drive four more times separately and confirmed that there are no errors when performing the fio read operation on it using the serial port or SSH terminal. For example, when the monitor is connected and we execute the fio read/write command in the serial port or terminal, there are no errors.

But when we execute the fio read/write command on the SSD in the graphical terminal, the GPU error always occurs.

Here are the logs captured with a 4 K Philips monitor.
[com COM7] (2025-07-29_173438) COM7 (USB-SERIAL CH340 (COM7)).log (861.1 KB)

Here are the logs captured with a 1 K Philips monitor.
[com COM13] (2025-07-29_104910) COM13 (USB-SERIAL CH340 (COM13)).log (276.0 KB)

Hi,

Please try if these two patches help or not.

diff --git a/nv-soc/tegra234-base-overlay.dtsi b/nv-soc/tegra234-base-overlay.dtsi
index 5827304..82f2171 100644
--- a/nv-soc/tegra234-base-overlay.dtsi
+++ b/nv-soc/tegra234-base-overlay.dtsi
@@ -426,6 +426,10 @@
 			status = "disabled";
 		};
 
+		sce-fabric@b600000 {
+			status = "disabled";
+		};
+
 		hardware-timestamp@c1e0000 {
 			status = "disabled";
 		};
diff --git a/nv-soc/tegra234-base-overlay.dtsi b/nv-soc/tegra234-base-overlay.dtsi
index 82f2171..7756863 100644
--- a/nv-soc/tegra234-base-overlay.dtsi
+++ b/nv-soc/tegra234-base-overlay.dtsi
@@ -434,6 +434,10 @@
 			status = "disabled";
 		};
 
+		dce-fabric@de00000 {
+			compatible = "nvidia,tegra234-dce-fabric";
+		};
+
 		i2c@3160000 {
 			iommus = <&smmu_niso0 TEGRA234_SID_GPCDMA>;
 			dma-coherent;

Apologies — I made a mistake in my previous message. The first log was captured on JetPack 6.0, whereas the second log you received came from JetPack 5.1.2. Consequently, when I tried to apply the patch on JetPack 5.1.2, the file nv-soc/tegra234-base-overlay.dtsi is not present.

Could you please provide a revised patch compatible with JetPack 5.1.2? We’ll validate it on that version.

Could you test this on Jetpack6 first?

On JetPack 6.0, the log message appears only sporadically; we have not yet found any reliable way to reproduce it. We will also verify whether the patch resolves the issue on JetPack 6.0 and, if it reappears, we will provide the corresponding log.

However, on JetPack 5.1.2 the problem is consistently reproducible, and we have confirmed the same stable reproduction on several other boards.

Hi,

We don’t have any patch for this on Jetpack5. Actually, the issue you reported from Jetpack5 and Jetapack6 could be different issues.

We applied the patch on JetPack 6.0, but it had no effect—GPU error messages still occur.

Setup
• Two dev boards linked via their 10 GbE ports
• Board A (client) starts four iperf3 instances:
iperf3 -c 192.168.1.10 -p 5201 -t 3000 &
iperf3 -c 192.168.1.10 -p 5202 -t 3000 &
iperf3 -c 192.168.1.10 -p 5203 -t 3000 &
iperf3 -c 192.168.1.10 -p 5204 -t 3000 &

• Board B (server) listens on four ports:
iperf3 -s -p 5201 &
iperf3 -s -p 5202 &
iperf3 -s -p 5203 &
iperf3 -s -p 5204 &

During the test, GPU errors appear in the serial log and the GUI freezes; the mouse becomes completely unresponsive. All of these symptoms occur on Board B (the server side)
Logs are attached.
iperf_gpu_error_bug_297_0806.txt (605.9 KB)

We also reproduced the same issue on an Orin AGX 32/64 GB DevKit; its log is attached as well.
iperf_gpu_error_bug_297_orin_devkit_0806.txt (87.5 KB)

Hi ,Has there been any progress on this issue?

There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks
~1008

Sorry for the late response.
Is this still an issue to support? Any result can be shared?