Nvgpu: 17000000.gpu ga10b_pbdma_handle_intr_0_acquire:646 [ERR] semaphore acquire timeout!

Hello, NVIDIA engineer.
During the normal operation of our controller, the display is flickering periodically, and the GPU error messages are being printed in a loop through the serial port. Could you please help us figure out what might be causing this issue

Software Environment:Jetpack 36.3
Hardware:Custom carrier board,Jetson AGX Orin 32G/64GB core module

gpu_error.txt (373.9 KB)

[2025-05-23 09:35:24]  [54266.319411] nvgpu: 17000000.gpu ga10b_pbdma_handle_intr_0_acquire:646  [ERR]  semaphore acquire timeout!
[2025-05-23 09:35:24]  [54266.319494] __ga10b__ Channel Status - chip ga10b
[2025-05-23 09:35:24]  [54266.319495] __ga10b__ ---------------------------
[2025-05-23 09:35:24]  [54266.319499] __ga10b__ 506-ga10b, TSG: 1, pid 2867, thread name gnome-shell, refs: 6, deterministic: no, domain name: (no domain)
[2025-05-23 09:35:24]  [54266.319500] __ga10b__ channel status:  in use on_pbdma, on_eng, pbdma_busy busy
[2025-05-23 09:35:24]  [54266.319504] __ga10b__ RAMFC: TOP: 8000001ffd121e74 PUT: 001ffd121e88 GET: 001ffd121e74 FETCH: 000000000000 HEADER: 2140006c COUNT: 11110000 SEMAPHORE: addr 002000130000 payload 000000000000317f execute 00081003
[2025-05-23 09:35:24]  [54266.319506] __ga10b__  
[2025-05-23 09:35:24]  [54266.319508] __ga10b__ 507-ga10b, TSG: 4, pid 2718, thread name Xorg, refs: 2, deterministic: no, domain name: (no domain)
[2025-05-23 09:35:24]  [54266.319509] __ga10b__ channel status:  in use idle not busy
[2025-05-23 09:35:24]  [54266.319511] __ga10b__ RAMFC: TOP: 8000002004033de8 PUT: 002004033de8 GET: 002004033de8 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004320000 payload 0000000000000000 execute 00000001
[2025-05-23 09:35:24]  [54266.319512] __ga10b__  
[2025-05-23 09:35:24]  [54266.319514] __ga10b__ 508-ga10b, TSG: 3, pid 2718, thread name Xorg, refs: 2, deterministic: no, domain name: (no domain)
[2025-05-23 09:35:24]  [54266.319515] __ga10b__ channel status:  in use on_eng not busy
[2025-05-23 09:35:24]  [54266.319516] __ga10b__ RAMFC: TOP: 800000200405b738 PUT: 00200405b738 GET: 00200405b738 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004020000 payload 0000000000000000 execute 00100001
[2025-05-23 09:35:24]  [54266.319517] __ga10b__  
[2025-05-23 09:35:24]  [54266.319520] __ga10b__ 511-ga10b, TSG: 0, pid 10924, thread name gnome-control-c, refs: 2, deterministic: no, domain name: (no domain)
[2025-05-23 09:35:24]  [54266.319521] __ga10b__ channel status:  in use idle not busy
[2025-05-23 09:35:24]  [54266.319522] __ga10b__ RAMFC: TOP: 80000020040080a0 PUT: 0020040080a0 GET: 0020040080a0 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004010000 payload 0000000000000000 execute 00100001
[2025-05-23 09:35:24]  [54266.319523] __ga10b__  
[2025-05-23 09:35:24]  [54266.319526] __ga10b__ PBDMA Status - chip ga10b
[2025-05-23 09:35:24]  [54266.319527] __ga10b__ -------------------------
[2025-05-23 09:35:24]  [54266.319530] __ga10b__ pbdma 0:
[2025-05-23 09:35:24]  [54266.319533] __ga10b__   id: 1 - [tsg]     next_id: - -1 [channel] | status: valid
[2025-05-23 09:35:24]  [54266.319539] __ga10b__   PBDMA_PUT 0000001ffd121e88 PBDMA_GET 0000001ffd121e74
[2025-05-23 09:35:24]  [54266.319545] __ga10b__   GP_PUT    00000939  GP_GET  0000092d  FETCH   0000092d HEADER 2140006c
[2025-05-23 09:35:24]  [54266.319549] __ga10b__   HDR       2001001b  SHADOW0 0405b710  SHADOW1 00002820
[2025-05-23 09:35:24]  [54266.319552] __ga10b__ pbdma 1:
[2025-05-23 09:35:24]  [54266.319553] __ga10b__   id: -1 - [channel] next_id: - -1 [channel] | status: invalid
[2025-05-23 09:35:24]  [54266.319558] __ga10b__   PBDMA_PUT 0000000ea9506654 PBDMA_GET 0000000ea9506654
[2025-05-23 09:35:24]  [54266.319564] __ga10b__   GP_PUT    00000000  GP_GET  1195251c  FETCH   00000000 HEADER 20d3f4ec
[2025-05-23 09:35:24]  [54266.319568] __ga10b__   HDR       d8347c57  SHADOW0 6a6c44d5  SHADOW1 7f0abf0c
[2025-05-23 09:35:24]  [54266.319570] __ga10b__ pbdma 2:
[2025-05-23 09:35:24]  [54266.319571] __ga10b__   id: -1 - [channel] next_id: - -1 [channel] | status: invalid
[2025-05-23 09:35:24]  [54266.319576] __ga10b__   PBDMA_PUT 000000f48d845368 PBDMA_GET 0000006729e0bfbc
[2025-05-23 09:35:24]  [54266.319582] __ga10b__   GP_PUT    00000000  GP_GET  1c31a496  FETCH   00000000 HEADER a0471f30
[2025-05-23 09:35:24]  [54266.319586] __ga10b__   HDR       6d358bb6  SHADOW0 948e01f9  SHADOW1 69e839c5
[2025-05-23 09:35:24]  [54266.319588] __ga10b__ pbdma 3:
[2025-05-23 09:35:24]  [54266.319589] __ga10b__   id: -1 - [channel] next_id: - -1 [channel] | status: invalid
[2025-05-23 09:35:24]  [54266.319594] __ga10b__   PBDMA_PUT 000000c193a89804 PBDMA_GET 000000c193a89804
[2025-05-23 09:35:24]  [54266.319599] __ga10b__   GP_PUT    00000000  GP_GET  45453238  FETCH   00000000 HEADER a085db98
[2025-05-23 09:35:24]  [54266.319603] __ga10b__   HDR       3ac9b5f3  SHADOW0 fd050082  SHADOW1 e28944d1
[2025-05-23 09:35:24]  [54266.319605] __ga10b__ pbdma 4:
[2025-05-23 09:35:24]  [54266.319606] __ga10b__   id: -1 - [channel] next_id: - -1 [channel] | status: invalid
[2025-05-23 09:35:24]  [54266.319611] __ga10b__   PBDMA_PUT 000000d9bcbcd504 PBDMA_GET 000000da5e5e5d7c
[2025-05-23 09:35:24]  [54266.319616] __ga10b__   GP_PUT    00000000  GP_GET  7850d773  FETCH   00000000 HEADER 20068d84
[2025-05-23 09:35:24]  [54266.319620] __ga10b__   HDR       9072a16a  SHADOW0 d373dc0a  SHADOW1 b684be5e
[2025-05-23 09:35:24]  [54266.319622] __ga10b__ pbdma 5:
[2025-05-23 09:35:24]  [54266.319623] __ga10b__   id: -1 - [channel] next_id: - -1 [channel] | status: invalid
[2025-05-23 09:35:24]  [54266.319628] __ga10b__   PBDMA_PUT 0000005aaf02b210 PBDMA_GET 0000005aaf02b210
[2025-05-23 09:35:24]  [54266.319633] __ga10b__   GP_PUT    00000000  GP_GET  c9c42953  FETCH   00000000 HEADER a15547a8
[2025-05-23 09:35:24]  [54266.319637] __ga10b__   HDR       8fa71867  SHADOW0 7588066b  SHADOW1 b01c5292
[2025-05-23 09:35:24]  [54266.319638] __ga10b__  
[2025-05-23 09:35:24]  [54266.319644] __ga10b__ ga10b eng 0: 
[2025-05-23 09:35:24]  [54266.319646] __ga10b__ id: 3 (tsg), next_id: -1 (channel), ctx status: valid 
[2025-05-23 09:35:24]  [54266.319647] __ga10b__  
[2025-05-23 09:35:24]  [54266.319650] __ga10b__ ga10b eng 1: 
[2025-05-23 09:35:24]  [54266.319651] __ga10b__ id: 1 (tsg), next_id: -1 (channel), ctx status: valid 
[2025-05-23 09:35:24]  [54266.319651] __ga10b__  
[2025-05-23 09:35:24]  [54266.319654] __ga10b__ ga10b eng 2: 
[2025-05-23 09:35:24]  [54266.319655] __ga10b__ id: -1 (channel), next_id: -1 (channel), ctx status: invalid 
[2025-05-23 09:35:24]  [54266.319656] __ga10b__  
[2025-05-23 09:35:24]  [54266.319659] __ga10b__ ga10b eng 3: 
[2025-05-23 09:35:24]  [54266.319660] __ga10b__ id: -1 (channel), next_id: -1 (channel), ctx status: invalid 
[2025-05-23 09:35:24]  [54266.319660] __ga10b__  
[2025-05-23 09:35:24]  [54266.319663] __ga10b__ ga10b eng 4: 
[2025-05-23 09:35:24]  [54266.319664] __ga10b__ id: -1 (channel), next_id: -1 (channel), ctx status: invalid 
[2025-05-23 09:35:24]  [54266.319665] __ga10b__  
[2025-05-23 09:35:24]  [54266.319668] __ga10b__ ga10b eng 5: 
[2025-05-23 09:35:24]  [54266.319669] __ga10b__ id: -1 (channel), next_id: -1 (channel), ctx status: invalid 
[2025-05-23 09:35:24]  [54266.319669] __ga10b__  
[2025-05-23 09:35:24]  [54266.319670] __ga10b__  
[2025-05-23 09:35:24]  [54266.319672] nvgpu: 17000000.gpu          ga10b_pbdma_report_error:330  [ERR]  pbdma_intr_0(0)= 0x04000000 
[2025-05-23 09:35:24]  [54266.319678] nvgpu: 17000000.gpu nvgpu_cic_mon_report_err_safety_services:97   [ERR]  Error reporting is not supported in this platform
[2025-05-23 09:35:24]  [54266.319687] nvgpu: 17000000.gpu     nvgpu_set_err_notifier_locked:143  [ERR]  error notifier set to 24 for ch 506 owned by gnome-shell

During the multiple restart tests using the reset button, two additional error messages occurred.
The serial port log is as follows:
reboot_test_gpu_err.txt (393.3 KB)

and the dmesg information is as follows:
dmesg_0611_log.txt (82.2 KB)

Hi,
We would suggest upgrade to latest Jetpack 6.2. On 6.0, uncertain if it helps but you may try this fix:

Jetson/L4T/r36.3.x patches - eLinux.org
[Power] Segmentation fault of nvpmodel service

I checked the MD5 values of the nvpmodel tool we are using and the nvpmodel provided in the patch, and they are the same

aa75f6f83d2367bfe9aa088c764b745b nvpmodel
aa75f6f83d2367bfe9aa088c764b745b nvpmodel-SegFixed

The error log appears to be related to the GPU module.

Hi,Any updates on this issue?

[Power] Segmentation fault of nvpmodel service

This patch has no effect at all.

Hi,
Please apply the patch and try:
Jetson/L4T/r36.4.x patches - eLinux.org

If the issue persists, please share steps for replicating the issue. We will set up developer kit with Jetpack 6.2.1 and check.

NvGPU

[NvGPU] slab-out-of-bounds in nvgpu_gr_config_init
Kernel crashes on specific Orin NX modules - #17 by AastaLLL

Display

[Orin NX 8G]display blank and with GPU error when resume [Orin NX 8G]display blank and with GPU error when resume - #5 by WayneWWW
[Orin] Display drivers install to the wrong path with official document instructions Display drivers install to the wrong path with official document instructions

I have applied the two patches locally and am currently validating them. Since we are running version 36.3, but the patches appear to be intended for 36.4, would they still be compatible with 36.3?

Hi,
The fixes are supposed to be required for r36.3:
Kernel crashes on specific Orin NX modules - #17 by AastaLLL
Display drivers install to the wrong path with official document instructions

Please apply them and try.