GPU Failure with CODE 0xbadf2021 at startup

After a restart (disconnecting power) on a customized AGX Orin board. About every 3rd time the GPU does not start correctly, meaning I can not run cuda-samples. In other cases the GPU works as expected. The GPU is under work when disconnecting the power

See the following output:

root@custom-board~# dmesg | grep gpu | head -n 20
[ 1841.758834] nvgpu: 17000000.ga10b               gp10b_priv_ring_isr:217  [ERR]  ringmaster intr status0: 0x00000100, status1: 0x00000000
[ 1841.759228] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:560  [ERR]  SYS write error: ADR 0x00508914 WRDAT 0x00000104 master 0x00000021
[ 1841.769532] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:563  [ERR]  INFO 0x18408321: (subid 0x00000018 priv_level 0 local_ordering 1)
[ 1841.782759] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:568  [ERR]  CODE 0xbadf2021
[ 1841.791859] nvgpu: 17000000.ga10b nvgpu_cic_mon_report_err_safety_services:55   [ERR]  Error reporting is not supported in this platform
[ 1841.803871] nvgpu: 17000000.ga10b ga10b_priv_ring_decode_error_code:542  [ERR]  [Error Type]: orphan(gpc/fbp)
[ 1841.813465] nvgpu: 17000000.ga10b      decode_fecs_pri_orphan_error:363  [ERR]  [Extra Info]: target_ringstation(0x21)
[ 1890.746087] nvgpu: 17000000.ga10b               gp10b_priv_ring_isr:217  [ERR]  ringmaster intr status0: 0x00000100, status1: 0x00000000
[ 1890.746487] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:560  [ERR]  SYS write error: ADR 0x0050b0c0 WRDAT 0x00001000 master 0x00000021
[ 1890.746876] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:563  [ERR]  INFO 0x19408321: (subid 0x00000019 priv_level 0 local_ordering 1)
[ 1890.747252] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:568  [ERR]  CODE 0xbadf2021
[ 1890.747529] nvgpu: 17000000.ga10b nvgpu_cic_mon_report_err_safety_services:55   [ERR]  Error reporting is not supported in this platform
[ 1890.747878] nvgpu: 17000000.ga10b ga10b_priv_ring_decode_error_code:542  [ERR]  [Error Type]: orphan(gpc/fbp)
[ 1890.748141] nvgpu: 17000000.ga10b      decode_fecs_pri_orphan_error:363  [ERR]  [Extra Info]: target_ringstation(0x21)
[ 1893.746765] nvgpu: 17000000.ga10b     nvgpu_timeout_expired_msg_cpu:94   [ERR]  Timeout detected @ ga10b_gr_init_wait_idle+0x9c/0x160 [nvgpu] 
[ 1893.747161] nvgpu: 17000000.ga10b           ga10b_gr_init_wait_idle:364  [ERR]  timeout gr busy : 1
[ 1893.747440] nvgpu: 17000000.ga10b nvgpu_gr_obj_ctx_alloc_golden_ctx_image:776  [ERR]  fail
[ 1893.747697] nvgpu: 17000000.ga10b            nvgpu_gr_obj_ctx_alloc:879  [ERR]  fail to init golden ctx image
[ 1893.747988] nvgpu: 17000000.ga10b            nvgpu_gr_obj_ctx_alloc:929  [ERR]  fail
[ 1893.748221] nvgpu: 17000000.ga10b      nvgpu_gr_setup_alloc_obj_ctx:216  [ERR]  failed to allocate gr ctx buffer

when I try to run /usr/bin/cuda-samples/UnifiedMemoryStreams

I get the following error (dmesg)

[ 2947.069025] nvgpu: 17000000.ga10b               gp10b_priv_ring_isr:217  [ERR]  ringmaster intr status0: 0x00000100, status1: 0x00000000
[ 2947.069411] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:560  [ERR]  SYS write error: ADR 0x0050b0c0 WRDAT 0x00001000 master 0x00000021
[ 2947.069814] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:563  [ERR]  INFO 0x19408321: (subid 0x00000019 priv_level 0 local_ordering 1)
[ 2947.070222] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:568  [ERR]  CODE 0xbadf2021
[ 2947.070502] nvgpu: 17000000.ga10b nvgpu_cic_mon_report_err_safety_services:55   [ERR]  Error reporting is not supported in this platform
[ 2947.070830] nvgpu: 17000000.ga10b ga10b_priv_ring_decode_error_code:542  [ERR]  [Error Type]: orphan(gpc/fbp)
[ 2947.071092] nvgpu: 17000000.ga10b      decode_fecs_pri_orphan_error:363  [ERR]  [Extra Info]: target_ringstation(0x21)
[ 2947.254376] audit: type=1334 audit(1715690939.092:222): prog-id=88 op=UNLOAD
[ 2947.254618] audit: type=1334 audit(1715690939.092:223): prog-id=87 op=UNLOAD
[ 2950.068943] nvgpu: 17000000.ga10b     nvgpu_timeout_expired_msg_cpu:94   [ERR]  Timeout detected @ ga10b_gr_init_wait_idle+0x9c/0x160 [nvgpu] 
[ 2950.069360] nvgpu: 17000000.ga10b           ga10b_gr_init_wait_idle:364  [ERR]  timeout gr busy : 1
[ 2950.069630] nvgpu: 17000000.ga10b nvgpu_gr_obj_ctx_alloc_golden_ctx_image:776  [ERR]  fail
[ 2950.069871] nvgpu: 17000000.ga10b            nvgpu_gr_obj_ctx_alloc:879  [ERR]  fail to init golden ctx image
[ 2950.070221] nvgpu: 17000000.ga10b            nvgpu_gr_obj_ctx_alloc:929  [ERR]  fail
[ 2950.070458] nvgpu: 17000000.ga10b      nvgpu_gr_setup_alloc_obj_ctx:216  [ERR]  failed to allocate gr ctx buffer
[ 2950.070835] nvgpu: 17000000.ga10b      nvgpu_gr_setup_alloc_obj_ctx:273  [ERR]  fail
[ 2950.071258] ------------[ cut here ]------------
[ 2950.071581] WARNING: CPU: 2 PID: 23102 at nvidia/nvgpu/drivers/gpu/nvgpu/common/gr/gr_setup.c:253 nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.072674] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables br_netfilter overlay cfg80211 aes_ce_blk crypto_simd snd_soc_tegra186_asrc cryptd snd_soc_tegra186_dspk snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra210_iqc snd_soc_tegra210_mvc snd_soc_tegra210_admaif aes_ce_cipher snd_soc_tegra210_afc snd_soc_tegra210_dmic snd_soc_tegra210_mixer snd_soc_tegra210_amx snd_soc_tegra210_adx snd_soc_tegra210_i2s snd_soc_tegra_pcm snd_soc_tegra210_sfc ghash_ce snd_soc_tegra210_adsp snd_soc_tegra_machine_driver sha2_ce sha256_arm64 snd_soc_tegra_utils sha1_ce snd_soc_simple_card_utils cdc_acm snd_soc_spdif_tx leds_gpio snd_hda_codec_hdmi nvadsp userspace_alert snd_soc_tegra210_ahub tegra210_adma nct1008 snd_hda_tegra tegra_bpmp_thermal snd_hda_codec snd_soc_rt5640 snd_hda_core snd_soc_rl6231 spi_tegra114 pwm_fan nvidia_drm(O) nvidia_modeset(O)
[ 2950.072804]  nvidia(O) nvgpu nvmap ina3221 fuse
[ 2950.153486] CPU: 2 PID: 23102 Comm: UnifiedMemorySt Tainted: G        W  O      5.10.104-l4t-r35.3.ga+g26cfd067b911 #1
[ 2950.164247] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS v35.3.1 01/24/2023
[ 2950.172647] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 2950.178755] pc : nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.185053] lr : nvgpu_gr_setup_alloc_obj_ctx+0x20c/0x3f0 [nvgpu]
[ 2950.191284] sp : ffff80001266bc60
[ 2950.194696] x29: ffff80001266bc90 x28: ffff5661cc817000 
[ 2950.200207] x27: ffff800016f67000 x26: ffff8000149e1020 
[ 2950.205719] x25: 0000000000000000 x24: ffff5661c72d7d00 
[ 2950.211232] x23: ffffce0dfa5724c8 x22: ffff5661db8f5780 
[ 2950.216656] x21: ffff8000149e1000 x20: ffff5661c74d0000 
[ 2950.221996] x19: ffff800016f66d20 x18: 0000000000000000 
[ 2950.227506] x17: 0000000000000000 x16: ffffce0e3f59d7ac 
[ 2950.233019] x15: 0000fffff97b12b8 x14: 0000000000000001 
[ 2950.238532] x13: 0000000000000038 x12: ffff56622e1f8880 
[ 2950.244046] x11: 0000000000000000 x10: 00000000175a2a2f 
[ 2950.249558] x9 : 0000000000000000 x8 : ffff5661db8f5580 
[ 2950.254895] x7 : 0000000000000000 x6 : 0000000000000000 
[ 2950.260232] x5 : ffff5661ccb9ab80 x4 : ffffce0dfa5c20d8 
[ 2950.265658] x3 : 00000000000000f8 x2 : 0000000000000000 
[ 2950.270994] x1 : 0000000000000000 x0 : 0000000000000000 
[ 2950.276333] Call trace:
[ 2950.278848]  nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.284450]  gk20a_channel_ioctl+0xce0/0xff0 [nvgpu]
[ 2950.289203]  __arm64_sys_ioctl+0xac/0xf0
[ 2950.292965]  el0_svc_common.constprop.0+0x80/0x1c4
[ 2950.297945]  do_el0_svc+0x74/0x8c
[ 2950.301186]  el0_svc+0x1c/0x2c
[ 2950.304157]  el0_sync_handler+0x9c/0x120
[ 2950.308096]  el0_sync+0x16c/0x180
[ 2950.311332] ---[ end trace 91a2047c276cbe5d ]---
[ 2950.317000] ------------[ cut here ]------------
[ 2950.320512] WARNING: CPU: 2 PID: 23102 at nvidia/nvgpu/drivers/gpu/nvgpu/common/gr/gr_setup.c:253 nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.333474] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables br_netfilter overlay cfg80211 aes_ce_blk crypto_simd snd_soc_tegra186_asrc cryptd snd_soc_tegra186_dspk snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra210_iqc snd_soc_tegra210_mvc snd_soc_tegra210_admaif aes_ce_cipher snd_soc_tegra210_afc snd_soc_tegra210_dmic snd_soc_tegra210_mixer snd_soc_tegra210_amx snd_soc_tegra210_adx snd_soc_tegra210_i2s snd_soc_tegra_pcm snd_soc_tegra210_sfc ghash_ce snd_soc_tegra210_adsp snd_soc_tegra_machine_driver sha2_ce sha256_arm64 snd_soc_tegra_utils sha1_ce snd_soc_simple_card_utils cdc_acm snd_soc_spdif_tx leds_gpio snd_hda_codec_hdmi nvadsp userspace_alert snd_soc_tegra210_ahub tegra210_adma nct1008 snd_hda_tegra tegra_bpmp_thermal snd_hda_codec snd_soc_rt5640 snd_hda_core snd_soc_rl6231 spi_tegra114 pwm_fan nvidia_drm(O) nvidia_modeset(O)
[ 2950.333531]  nvidia(O) nvgpu nvmap ina3221 fuse
[ 2950.425085] CPU: 2 PID: 23102 Comm: UnifiedMemorySt Tainted: G        W  O      5.10.104-l4t-r35.3.ga+g26cfd067b911 #1
[ 2950.435846] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS v35.3.1 01/24/2023
[ 2950.444246] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 2950.450355] pc : nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.456647] lr : nvgpu_gr_setup_alloc_obj_ctx+0x20c/0x3f0 [nvgpu]
[ 2950.462882] sp : ffff80001266bc60
[ 2950.466294] x29: ffff80001266bc90 x28: ffff5661cc817000 
[ 2950.471807] x27: ffff800016f66690 x26: ffff8000149e1020 
[ 2950.477319] x25: 0000000000000000 x24: ffff5661c72d7d00 
[ 2950.482834] x23: ffffce0dfa5724c8 x22: ffff5662358ab080 
[ 2950.488258] x21: ffff8000149e1000 x20: ffff5661c74d0000 
[ 2950.493596] x19: ffff800016f663b0 x18: 0000000000000000 
[ 2950.499107] x17: 0000000000000000 x16: ffffce0e3f59d7ac 
[ 2950.504622] x15: 0000fffff97b12b8 x14: 0000000000000001 
[ 2950.510132] x13: 0000000000000038 x12: ffff56622e1f8880 
[ 2950.515646] x11: 0000000000000000 x10: 00000000106fed2f 
[ 2950.521157] x9 : 0000000000000000 x8 : ffff5662358abc00 
[ 2950.526494] x7 : 0000000000000000 x6 : 0000000000000000 
[ 2950.531831] x5 : ffff5661ccb9ab80 x4 : ffffce0dfa5c20d8 
[ 2950.537258] x3 : 00000000000000f8 x2 : 0000000000000000 
[ 2950.542594] x1 : 0000000000000000 x0 : 0000000000000000 
[ 2950.547933] Call trace:
[ 2950.550446]  nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.556050]  gk20a_channel_ioctl+0xce0/0xff0 [nvgpu]
[ 2950.560800]  __arm64_sys_ioctl+0xac/0xf0
[ 2950.564561]  el0_svc_common.constprop.0+0x80/0x1c4
[ 2950.569546]  do_el0_svc+0x74/0x8c
[ 2950.572785]  el0_svc+0x1c/0x2c
[ 2950.575757]  el0_sync_handler+0x9c/0x120
[ 2950.579695]  el0_sync+0x16c/0x180
[ 2950.582932] ---[ end trace 91a2047c276cbe5e ]---
[ 2950.587877] ------------[ cut here ]------------
[ 2950.592108] WARNING: CPU: 2 PID: 23102 at nvidia/nvgpu/drivers/gpu/nvgpu/common/gr/gr_setup.c:253 nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.605071] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables br_netfilter overlay cfg80211 aes_ce_blk crypto_simd snd_soc_tegra186_asrc cryptd snd_soc_tegra186_dspk snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra210_iqc snd_soc_tegra210_mvc snd_soc_tegra210_admaif aes_ce_cipher snd_soc_tegra210_afc snd_soc_tegra210_dmic snd_soc_tegra210_mixer snd_soc_tegra210_amx snd_soc_tegra210_adx snd_soc_tegra210_i2s snd_soc_tegra_pcm snd_soc_tegra210_sfc ghash_ce snd_soc_tegra210_adsp snd_soc_tegra_machine_driver sha2_ce sha256_arm64 snd_soc_tegra_utils sha1_ce snd_soc_simple_card_utils cdc_acm snd_soc_spdif_tx leds_gpio snd_hda_codec_hdmi nvadsp userspace_alert snd_soc_tegra210_ahub tegra210_adma nct1008 snd_hda_tegra tegra_bpmp_thermal snd_hda_codec snd_soc_rt5640 snd_hda_core snd_soc_rl6231 spi_tegra114 pwm_fan nvidia_drm(O) nvidia_modeset(O)
[ 2950.605123]  nvidia(O) nvgpu nvmap ina3221 fuse
[ 2950.696686] CPU: 2 PID: 23102 Comm: UnifiedMemorySt Tainted: G        W  O      5.10.104-l4t-r35.3.ga+g26cfd067b911 #1
[ 2950.707447] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS v35.3.1 01/24/2023
[ 2950.715847] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 2950.721948] pc : nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.728245] lr : nvgpu_gr_setup_alloc_obj_ctx+0x20c/0x3f0 [nvgpu]
[ 2950.734484] sp : ffff80001266bc60
[ 2950.737894] x29: ffff80001266bc90 x28: ffff5661cc817000 
[ 2950.743407] x27: ffff800016f67e28 x26: ffff8000149e1020 
[ 2950.748919] x25: 0000000000000000 x24: ffff5661c72d7d00 
[ 2950.754432] x23: ffffce0dfa5724c8 x22: ffff5661eb826600 
[ 2950.759858] x21: ffff8000149e1000 x20: ffff5661c74d0000 
[ 2950.765194] x19: ffff800016f67b48 x18: 0000000000000000 
[ 2950.770706] x17: 0000000000000000 x16: ffffce0e3f59d7ac 
[ 2950.776219] x15: 0000fffff97b12b8 x14: 0000000000000001 
[ 2950.781733] x13: 0000000000000038 x12: ffff56622e1f8880 
[ 2950.787246] x11: 0000000000000000 x10: 000000001087c82f 
[ 2950.792758] x9 : 0000000000000000 x8 : ffff5661c2750c80 
[ 2950.798095] x7 : 0000000000000000 x6 : 0000000000000000 
[ 2950.803433] x5 : ffff5661ccb9ab80 x4 : ffffce0dfa5c20d8 
[ 2950.808857] x3 : 00000000000000f8 x2 : 0000000000000000 
[ 2950.814195] x1 : 0000000000000000 x0 : 0000000000000000 
[ 2950.819533] Call trace:
[ 2950.822044]  nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.827647]  gk20a_channel_ioctl+0xce0/0xff0 [nvgpu]
[ 2950.832400]  __arm64_sys_ioctl+0xac/0xf0
[ 2950.836164]  el0_svc_common.constprop.0+0x80/0x1c4
[ 2950.841146]  do_el0_svc+0x74/0x8c
[ 2950.844385]  el0_svc+0x1c/0x2c
[ 2950.847358]  el0_sync_handler+0x9c/0x120
[ 2950.851296]  el0_sync+0x16c/0x180
[ 2950.854533] ---[ end trace 91a2047c276cbe5f ]---
[ 2950.859640] ------------[ cut here ]------------
[ 2950.863702] WARNING: CPU: 2 PID: 23102 at nvidia/nvgpu/drivers/gpu/nvgpu/common/gr/gr_setup.c:253 nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.876673] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables br_netfilter overlay cfg80211 aes_ce_blk crypto_simd snd_soc_tegra186_asrc cryptd snd_soc_tegra186_dspk snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra210_iqc snd_soc_tegra210_mvc snd_soc_tegra210_admaif aes_ce_cipher snd_soc_tegra210_afc snd_soc_tegra210_dmic snd_soc_tegra210_mixer snd_soc_tegra210_amx snd_soc_tegra210_adx snd_soc_tegra210_i2s snd_soc_tegra_pcm snd_soc_tegra210_sfc ghash_ce snd_soc_tegra210_adsp snd_soc_tegra_machine_driver sha2_ce sha256_arm64 snd_soc_tegra_utils sha1_ce snd_soc_simple_card_utils cdc_acm snd_soc_spdif_tx leds_gpio snd_hda_codec_hdmi nvadsp userspace_alert snd_soc_tegra210_ahub tegra210_adma nct1008 snd_hda_tegra tegra_bpmp_thermal snd_hda_codec snd_soc_rt5640 snd_hda_core snd_soc_rl6231 spi_tegra114 pwm_fan nvidia_drm(O) nvidia_modeset(O)
[ 2950.876723]  nvidia(O) nvgpu nvmap ina3221 fuse
[ 2950.968287] CPU: 2 PID: 23102 Comm: UnifiedMemorySt Tainted: G        W  O      5.10.104-l4t-r35.3.ga+g26cfd067b911 #1
[ 2950.979046] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS v35.3.1 01/24/2023
[ 2950.987446] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 2950.993547] pc : nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2950.999845] lr : nvgpu_gr_setup_alloc_obj_ctx+0x20c/0x3f0 [nvgpu]
[ 2951.006082] sp : ffff80001266bc60
[ 2951.009494] x29: ffff80001266bc90 x28: ffff5661cc817000 
[ 2951.015007] x27: ffff800016f67970 x26: ffff8000149e1020 
[ 2951.020519] x25: 0000000000000000 x24: ffff5661c72d7d00 
[ 2951.026032] x23: ffffce0dfa5724c8 x22: ffff56622e142c80 
[ 2951.031457] x21: ffff8000149e1000 x20: ffff5661c74d0000 
[ 2951.036796] x19: ffff800016f67690 x18: 0000000000000000 
[ 2951.042307] x17: 0000000000000000 x16: ffffce0e3f59d7ac 
[ 2951.047820] x15: 0000fffff97b12b8 x14: 0000000000000001 
[ 2951.053333] x13: 0000000000000038 x12: ffff56622e1f8880 
[ 2951.058844] x11: 0000000000000000 x10: 0000000010cbb52f 
[ 2951.064359] x9 : 0000000000000000 x8 : ffff56622e142200 
[ 2951.069694] x7 : 0000000000000000 x6 : 0000000000000000 
[ 2951.075032] x5 : ffff5661ccb9ab80 x4 : ffffce0dfa5c20d8 
[ 2951.080458] x3 : 00000000000000f8 x2 : 0000000000000000 
[ 2951.085794] x1 : 0000000000000000 x0 : 0000000000000000 
[ 2951.091133] Call trace:
[ 2951.093643]  nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.099247]  gk20a_channel_ioctl+0xce0/0xff0 [nvgpu]
[ 2951.103998]  __arm64_sys_ioctl+0xac/0xf0
[ 2951.107761]  el0_svc_common.constprop.0+0x80/0x1c4
[ 2951.112745]  do_el0_svc+0x74/0x8c
[ 2951.115983]  el0_svc+0x1c/0x2c
[ 2951.118959]  el0_sync_handler+0x9c/0x120
[ 2951.122896]  el0_sync+0x16c/0x180
[ 2951.126133] ---[ end trace 91a2047c276cbe60 ]---
[ 2951.131022] ------------[ cut here ]------------
[ 2951.135301] WARNING: CPU: 2 PID: 23102 at nvidia/nvgpu/drivers/gpu/nvgpu/common/gr/gr_setup.c:253 nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.148272] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables br_netfilter overlay cfg80211 aes_ce_blk crypto_simd snd_soc_tegra186_asrc cryptd snd_soc_tegra186_dspk snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra210_iqc snd_soc_tegra210_mvc snd_soc_tegra210_admaif aes_ce_cipher snd_soc_tegra210_afc snd_soc_tegra210_dmic snd_soc_tegra210_mixer snd_soc_tegra210_amx snd_soc_tegra210_adx snd_soc_tegra210_i2s snd_soc_tegra_pcm snd_soc_tegra210_sfc ghash_ce snd_soc_tegra210_adsp snd_soc_tegra_machine_driver sha2_ce sha256_arm64 snd_soc_tegra_utils sha1_ce snd_soc_simple_card_utils cdc_acm snd_soc_spdif_tx leds_gpio snd_hda_codec_hdmi nvadsp userspace_alert snd_soc_tegra210_ahub tegra210_adma nct1008 snd_hda_tegra tegra_bpmp_thermal snd_hda_codec snd_soc_rt5640 snd_hda_core snd_soc_rl6231 spi_tegra114 pwm_fan nvidia_drm(O) nvidia_modeset(O)
[ 2951.148324]  nvidia(O) nvgpu nvmap ina3221 fuse
[ 2951.239884] CPU: 2 PID: 23102 Comm: UnifiedMemorySt Tainted: G        W  O      5.10.104-l4t-r35.3.ga+g26cfd067b911 #1
[ 2951.250645] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS v35.3.1 01/24/2023
[ 2951.259048] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 2951.265145] pc : nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.271443] lr : nvgpu_gr_setup_alloc_obj_ctx+0x20c/0x3f0 [nvgpu]
[ 2951.277684] sp : ffff80001266bc60
[ 2951.281095] x29: ffff80001266bc90 x28: ffff5661cc817000 
[ 2951.286608] x27: ffff800016f674b8 x26: ffff8000149e1020 
[ 2951.292119] x25: 0000000000000000 x24: ffff5661c72d7d00 
[ 2951.297633] x23: ffffce0dfa5724c8 x22: ffff5661cab8de00 
[ 2951.303058] x21: ffff8000149e1000 x20: ffff5661c74d0000 
[ 2951.308394] x19: ffff800016f671d8 x18: 0000000000000000 
[ 2951.313907] x17: 0000000000000000 x16: ffffce0e3f59d7ac 
[ 2951.319419] x15: 0000fffff97b12b8 x14: 0000000000000001 
[ 2951.324932] x13: 0000000000000038 x12: ffff56622e1f8880 
[ 2951.330444] x11: 0000000000000000 x10: 000000001748d02f 
[ 2951.335958] x9 : 0000000000000000 x8 : ffff5661cab8df80 
[ 2951.341295] x7 : 0000000000000000 x6 : 0000000000000000 
[ 2951.346632] x5 : ffff5661ccb9ab80 x4 : ffffce0dfa5c20d8 
[ 2951.352056] x3 : 00000000000000f8 x2 : 0000000000000000 
[ 2951.357395] x1 : 0000000000000000 x0 : 0000000000000000 
[ 2951.362733] Call trace:
[ 2951.365242]  nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.370846]  gk20a_channel_ioctl+0xce0/0xff0 [nvgpu]
[ 2951.375599]  __arm64_sys_ioctl+0xac/0xf0
[ 2951.379360]  el0_svc_common.constprop.0+0x80/0x1c4
[ 2951.384347]  do_el0_svc+0x74/0x8c
[ 2951.387583]  el0_svc+0x1c/0x2c
[ 2951.390558]  el0_sync_handler+0x9c/0x120
[ 2951.394494]  el0_sync+0x16c/0x180
[ 2951.397733] ---[ end trace 91a2047c276cbe61 ]---
[ 2951.402759] ------------[ cut here ]------------
[ 2951.406901] WARNING: CPU: 2 PID: 23102 at nvidia/nvgpu/drivers/gpu/nvgpu/common/gr/gr_setup.c:253 nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.419871] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables br_netfilter overlay cfg80211 aes_ce_blk crypto_simd snd_soc_tegra186_asrc cryptd snd_soc_tegra186_dspk snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra210_iqc snd_soc_tegra210_mvc snd_soc_tegra210_admaif aes_ce_cipher snd_soc_tegra210_afc snd_soc_tegra210_dmic snd_soc_tegra210_mixer snd_soc_tegra210_amx snd_soc_tegra210_adx snd_soc_tegra210_i2s snd_soc_tegra_pcm snd_soc_tegra210_sfc ghash_ce snd_soc_tegra210_adsp snd_soc_tegra_machine_driver sha2_ce sha256_arm64 snd_soc_tegra_utils sha1_ce snd_soc_simple_card_utils cdc_acm snd_soc_spdif_tx leds_gpio snd_hda_codec_hdmi nvadsp userspace_alert snd_soc_tegra210_ahub tegra210_adma nct1008 snd_hda_tegra tegra_bpmp_thermal snd_hda_codec snd_soc_rt5640 snd_hda_core snd_soc_rl6231 spi_tegra114 pwm_fan nvidia_drm(O) nvidia_modeset(O)
[ 2951.419924]  nvidia(O) nvgpu nvmap ina3221 fuse
[ 2951.511485] CPU: 2 PID: 23102 Comm: UnifiedMemorySt Tainted: G        W  O      5.10.104-l4t-r35.3.ga+g26cfd067b911 #1
[ 2951.522246] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS v35.3.1 01/24/2023
[ 2951.530648] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 2951.536744] pc : nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.543044] lr : nvgpu_gr_setup_alloc_obj_ctx+0x20c/0x3f0 [nvgpu]
[ 2951.549284] sp : ffff80001266bc60
[ 2951.552695] x29: ffff80001266bc90 x28: ffff5661cc817000 
[ 2951.558208] x27: ffff800016f66b48 x26: ffff8000149e1020 
[ 2951.563720] x25: 0000000000000000 x24: ffff5661c72d7d00 
[ 2951.569231] x23: ffffce0dfa5724c8 x22: ffff5661cfc39a00 
[ 2951.574656] x21: ffff8000149e1000 x20: ffff5661c74d0000 
[ 2951.579995] x19: ffff800016f66868 x18: 0000000000000000 
[ 2951.585506] x17: 0000000000000000 x16: ffffce0e3f59d7ac 
[ 2951.591019] x15: 0000fffff97b12b8 x14: 0000000000000001 
[ 2951.596533] x13: 0000000000000038 x12: ffff56622e1f8880 
[ 2951.602044] x11: 0000000000000000 x10: 0000000010fc422f 
[ 2951.607558] x9 : 0000000000000000 x8 : ffff5661cfc39080 
[ 2951.612895] x7 : 0000000000000000 x6 : 0000000000000000 
[ 2951.618232] x5 : ffff5661ccb9ab80 x4 : ffffce0dfa5c20d8 
[ 2951.623657] x3 : 00000000000000f8 x2 : 0000000000000000 
[ 2951.628995] x1 : 0000000000000000 x0 : 0000000000000000 
[ 2951.634334] Call trace:
[ 2951.636841]  nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.642445]  gk20a_channel_ioctl+0xce0/0xff0 [nvgpu]
[ 2951.647200]  __arm64_sys_ioctl+0xac/0xf0
[ 2951.650961]  el0_svc_common.constprop.0+0x80/0x1c4
[ 2951.655947]  do_el0_svc+0x74/0x8c
[ 2951.659183]  el0_svc+0x1c/0x2c
[ 2951.662158]  el0_sync_handler+0x9c/0x120
[ 2951.666095]  el0_sync+0x16c/0x180
[ 2951.669333] ---[ end trace 91a2047c276cbe62 ]---
[ 2951.674191] ------------[ cut here ]------------
[ 2951.678498] WARNING: CPU: 2 PID: 23102 at nvidia/nvgpu/drivers/gpu/nvgpu/common/gr/gr_setup.c:253 nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.691473] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables br_netfilter overlay cfg80211 aes_ce_blk crypto_simd snd_soc_tegra186_asrc cryptd snd_soc_tegra186_dspk snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra210_iqc snd_soc_tegra210_mvc snd_soc_tegra210_admaif aes_ce_cipher snd_soc_tegra210_afc snd_soc_tegra210_dmic snd_soc_tegra210_mixer snd_soc_tegra210_amx snd_soc_tegra210_adx snd_soc_tegra210_i2s snd_soc_tegra_pcm snd_soc_tegra210_sfc ghash_ce snd_soc_tegra210_adsp snd_soc_tegra_machine_driver sha2_ce sha256_arm64 snd_soc_tegra_utils sha1_ce snd_soc_simple_card_utils cdc_acm snd_soc_spdif_tx leds_gpio snd_hda_codec_hdmi nvadsp userspace_alert snd_soc_tegra210_ahub tegra210_adma nct1008 snd_hda_tegra tegra_bpmp_thermal snd_hda_codec snd_soc_rt5640 snd_hda_core snd_soc_rl6231 spi_tegra114 pwm_fan nvidia_drm(O) nvidia_modeset(O)
[ 2951.691523]  nvidia(O) nvgpu nvmap ina3221 fuse
[ 2951.783083] CPU: 2 PID: 23102 Comm: UnifiedMemorySt Tainted: G        W  O      5.10.104-l4t-r35.3.ga+g26cfd067b911 #1
[ 2951.793848] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS v35.3.1 01/24/2023
[ 2951.802247] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 2951.808345] pc : nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.814642] lr : nvgpu_gr_setup_alloc_obj_ctx+0x20c/0x3f0 [nvgpu]
[ 2951.820884] sp : ffff80001266bc60
[ 2951.824294] x29: ffff80001266bc90 x28: ffff5661cc817000 
[ 2951.829807] x27: ffff800016f661d8 x26: ffff8000149e1020 
[ 2951.835320] x25: 0000000000000000 x24: ffff5661c72d7d00 
[ 2951.840833] x23: ffffce0dfa5724c8 x22: ffff566234900380 
[ 2951.846258] x21: ffff8000149e1000 x20: ffff5661c74d0000 
[ 2951.851595] x19: ffff800016f65ef8 x18: 0000000000000000 
[ 2951.857107] x17: 0000000000000000 x16: ffffce0e3f59d7ac 
[ 2951.862620] x15: 0000fffff97b12b8 x14: 0000000000000001 
[ 2951.868133] x13: 0000000000000038 x12: ffff56622e1f8880 
[ 2951.873645] x11: 0000000000000000 x10: 00000000174b502f 
[ 2951.879157] x9 : 0000000000000000 x8 : ffff5661c758c000 
[ 2951.884495] x7 : 0000000000000000 x6 : 0000000000000000 
[ 2951.889833] x5 : ffff5661ccb9ab80 x4 : ffffce0dfa5c20d8 
[ 2951.895258] x3 : 00000000000000f8 x2 : 0000000000000000 
[ 2951.900595] x1 : 0000000000000000 x0 : 0000000000000000 
[ 2951.905932] Call trace:
[ 2951.908443]  nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.914045]  gk20a_channel_ioctl+0xce0/0xff0 [nvgpu]
[ 2951.918798]  __arm64_sys_ioctl+0xac/0xf0
[ 2951.922562]  el0_svc_common.constprop.0+0x80/0x1c4
[ 2951.927546]  do_el0_svc+0x74/0x8c
[ 2951.930782]  el0_svc+0x1c/0x2c
[ 2951.933757]  el0_sync_handler+0x9c/0x120
[ 2951.937697]  el0_sync+0x16c/0x180
[ 2951.940932] ---[ end trace 91a2047c276cbe63 ]---
[ 2951.945962] ------------[ cut here ]------------
[ 2951.950100] WARNING: CPU: 2 PID: 23102 at nvidia/nvgpu/drivers/gpu/nvgpu/common/gr/gr_setup.c:253 nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2951.963071] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables br_netfilter overlay cfg80211 aes_ce_blk crypto_simd snd_soc_tegra186_asrc cryptd snd_soc_tegra186_dspk snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra210_iqc snd_soc_tegra210_mvc snd_soc_tegra210_admaif aes_ce_cipher snd_soc_tegra210_afc snd_soc_tegra210_dmic snd_soc_tegra210_mixer snd_soc_tegra210_amx snd_soc_tegra210_adx snd_soc_tegra210_i2s snd_soc_tegra_pcm snd_soc_tegra210_sfc ghash_ce snd_soc_tegra210_adsp snd_soc_tegra_machine_driver sha2_ce sha256_arm64 snd_soc_tegra_utils sha1_ce snd_soc_simple_card_utils cdc_acm snd_soc_spdif_tx leds_gpio snd_hda_codec_hdmi nvadsp userspace_alert snd_soc_tegra210_ahub tegra210_adma nct1008 snd_hda_tegra tegra_bpmp_thermal snd_hda_codec snd_soc_rt5640 snd_hda_core snd_soc_rl6231 spi_tegra114 pwm_fan nvidia_drm(O) nvidia_modeset(O)
[ 2951.963123]  nvidia(O) nvgpu nvmap ina3221 fuse
[ 2952.054684] CPU: 2 PID: 23102 Comm: UnifiedMemorySt Tainted: G        W  O      5.10.104-l4t-r35.3.ga+g26cfd067b911 #1
[ 2952.065446] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS v35.3.1 01/24/2023
[ 2952.073846] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 2952.079944] pc : nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2952.086242] lr : nvgpu_gr_setup_alloc_obj_ctx+0x20c/0x3f0 [nvgpu]
[ 2952.092482] sp : ffff80001266bc60
[ 2952.095895] x29: ffff80001266bc90 x28: ffff5661cc817000 
[ 2952.101407] x27: ffff800016f65d20 x26: ffff8000149e1020 
[ 2952.106920] x25: 0000000000000000 x24: ffff5661c72d7d00 
[ 2952.112431] x23: ffffce0dfa5724c8 x22: ffff5661cfe54300 
[ 2952.117856] x21: ffff8000149e1000 x20: ffff5661c74d0000 
[ 2952.123196] x19: ffff800016f65a40 x18: 0000000000000000 
[ 2952.128707] x17: 0000000000000000 x16: ffffce0e3f59d7ac 
[ 2952.134219] x15: 0000fffff97b12b8 x14: 0000000000000001 
[ 2952.139731] x13: 0000000000000038 x12: ffff56622e1f8880 
[ 2952.145244] x11: 0000000000000000 x10: 000000001057be2f 
[ 2952.150757] x9 : 0000000000000000 x8 : ffff5661cfe54b80 
[ 2952.156095] x7 : 0000000000000000 x6 : 0000000000000000 
[ 2952.161434] x5 : ffff5661ccb9ab80 x4 : ffffce0dfa5c20d8 
[ 2952.166858] x3 : 00000000000000f8 x2 : 0000000000000000 
[ 2952.172195] x1 : 0000000000000000 x0 : 0000000000000000 
[ 2952.177532] Call trace:
[ 2952.180044]  nvgpu_gr_setup_alloc_obj_ctx+0x21c/0x3f0 [nvgpu]
[ 2952.185646]  gk20a_channel_ioctl+0xce0/0xff0 [nvgpu]
[ 2952.190396]  __arm64_sys_ioctl+0xac/0xf0
[ 2952.194160]  el0_svc_common.constprop.0+0x80/0x1c4
[ 2952.199145]  do_el0_svc+0x74/0x8c
[ 2952.202382]  el0_svc+0x1c/0x2c
[ 2952.205357]  el0_sync_handler+0x9c/0x120
[ 2952.209296]  el0_sync+0x16c/0x180
[ 2952.212531] ---[ end trace 91a2047c276cbe64 ]---
[ 2952.218970] nvgpu: 17000000.ga10b               gp10b_priv_ring_isr:217  [ERR]  ringmaster intr status0: 0x00000100, status1: 0x00000000
[ 2952.229107] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:560  [ERR]  SYS write error: ADR 0x0050b0c0 WRDAT 0x00001000 master 0x00000021
[ 2952.242479] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:563  [ERR]  INFO 0x19408321: (subid 0x00000019 priv_level 0 local_ordering 1)
[ 2952.255976] nvgpu: 17000000.ga10b ga10b_priv_ring_handle_sys_write_errors:568  [ERR]  CODE 0xbadf2021
[ 2952.265242] nvgpu: 17000000.ga10b nvgpu_cic_mon_report_err_safety_services:55   [ERR]  Error reporting is not supported in this platform
[ 2952.277402] nvgpu: 17000000.ga10b ga10b_priv_ring_decode_error_code:542  [ERR]  [Error Type]: orphan(gpc/fbp)
[ 2952.287204] nvgpu: 17000000.ga10b      decode_fecs_pri_orphan_error:363  [ERR]  [Extra Info]: target_ringstation(0x21)
[ 2955.219754] nvgpu: 17000000.ga10b     nvgpu_timeout_expired_msg_cpu:94   [ERR]  Timeout detected @ ga10b_gr_init_wait_idle+0x9c/0x160 [nvgpu] 
[ 2955.220153] nvgpu: 17000000.ga10b           ga10b_gr_init_wait_idle:364  [ERR]  timeout gr busy : 1
[ 2955.220433] nvgpu: 17000000.ga10b nvgpu_gr_obj_ctx_alloc_golden_ctx_image:776  [ERR]  fail
[ 2955.220680] nvgpu: 17000000.ga10b            nvgpu_gr_obj_ctx_alloc:879  [ERR]  fail to init golden ctx image
[ 2955.220987] nvgpu: 17000000.ga10b            nvgpu_gr_obj_ctx_alloc:929  [ERR]  fail
[ 2955.221217] nvgpu: 17000000.ga10b      nvgpu_gr_setup_alloc_obj_ctx:216  [ERR]  failed to allocate gr ctx buffer
[ 2955.221572

I run a customized L4T-R35.3

Hi,

Please try to upgrade to rel-35.5 and see if if you can still reproduce issue.