OrinNX nvgpu sm err state

When I heavily utilize CPUs on my Orin NX 16GB JetPack 6.2 device and the device heats up, I encounter errors like these. Do these errors cause any issues? If so, how can I resolve them?

[ 8808.282292] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR]  sm err state gpc_id(0), tpc_id(3), offset(6144), sm_id(0), hww_global_esr 16,hww_warp_esr 0, hww_warp_esr_pc 0x0
[ 8811.044851] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR]  sm err state gpc_id(0), tpc_id(3), offset(6144), sm_id(0), hww_global_esr 16,hww_warp_esr 0, hww_warp_esr_pc 0x0
[ 8811.046986] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR]  sm err state gpc_id(0), tpc_id(3), offset(6272), sm_id(1), hww_global_esr 16,hww_warp_esr 0, hww_warp_esr_pc 0x0
[ 8811.057341] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR]  sm err state gpc_id(0), tpc_id(3), offset(6144), sm_id(0), hww_global_esr 16,hww_warp_esr 0, hww_warp_esr_pc 0x0
[ 8811.084896] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR]  sm err state gpc_id(0), tpc_id(3), offset(6144), sm_id(0), hww_global_esr 16,hww_warp_esr 0, hww_warp_esr_pc 0x0
[ 8811.136795] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR]  sm err state gpc_id(0), tpc_id(3), offset(6144), sm_id(0), hww_global_esr 16,hww_warp_esr 0, hww_warp_esr_pc 0x0
[ 8813.889951] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR]  sm err state gpc_id(0), tpc_id(3), offset(6144), sm_id(0), hww_global_esr 16,hww_warp_esr 0, hww_warp_esr_pc 0x0
[ 8814.215336] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR]  sm err state gpc_id(0), tpc_id(3), offset(6144), sm_id(0), hww_global_esr 16,hww_warp_esr 0, hww_warp_esr_pc 0x0
[ 8814.257510] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR]  sm err state gpc_id(0), tpc_id(3), offset(6144), sm_id(0), hww_global_esr 16,hww_warp_esr 0, hww_warp_esr_pc 0x0
[ 8826.857837] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR]  sm err state gpc_id(0), tpc_id(3), offset(6144), sm_id(0), hww_global_esr 16,hww_warp_esr 0, hww_warp_esr_pc 0x0

Hi,

When the error occurs, does the GPU task run normally or fails?
Thanks.

Hi,

When the error occurs, GPU tasks continue to run normally in the short term. However, could this situation cause problems in the long run?

Hi,

Please also reproduce the error with CUDA coredump enabled and share the dmesg output with us.

$ export CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.