During the pressure test, a GPU-related error occurred on the TCU serial port

Hello, during our peripheral pressure test, the TCU serial port encountered GPU-related error messages. Could you please help us analyze the cause?
Operating environment: Jetpack 6.0, with PREEMPT_RT enabled.
Test content:

  1. Enable preview for 8 GMSL cameras.
  2. Conduct iperf tests on 5 network interfaces (4 expanded by I350 chip + native 10G).
  3. Insert USB flash drives into 4 USB3.0 ports and a mobile hard drive into the Type-C port for fio random 4K read.
  4. Use stress to perform CPU stress tests.
  5. Use glmark2 to conduct GPU stress tests.

The exception log is as follows:

[2025-06-10 18:44:45] Ubuntu 22.04.4 LTS ubuntu ttyTCU0

[2025-06-10 18:44:45]

[2025-06-10 18:44:45] ubuntu login: [ 291.625490] nvmap_alloc_handle: PID 5911: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[2025-06-10 18:49:22] [ 291.628961] nvmap_alloc_handle: PID 5918: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[2025-06-10 18:49:22] [ 291.631431] nvmap_alloc_handle: PID 5921: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[2025-06-10 18:49:22] [ 291.681997] nvmap_alloc_handle: PID 5915: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[2025-06-10 18:49:22] [ 291.684628] nvmap_alloc_handle: PID 5922: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[2025-06-10 18:49:22] [ 291.696852] nvmap_alloc_handle: PID 5914: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[2025-06-10 18:49:22] [ 291.704956] nvmap_alloc_handle: PID 5920: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[2025-06-10 18:49:22] [ 291.715342] nvmap_alloc_handle: PID 5926: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[2025-06-10 18:49:22] [ 598.304324] nvethernet 6800000.ethernet: [xpcs_lane_bring_up][470][type:0x4][loga-0x0] Failed to get PCS block lock
[2025-06-10 18:54:29] [37294.042674] nvgpu: 17000000.gpu nvgpu_channel_recover_from_wdt:112 [ERR] Job on channel 373 timed out
[2025-06-11 05:06:04] ** 1421 printk messages dropped **
[2025-06-11 05:06:04] [37294.052054] ga10b NV_PGRAPH_PRI_GPC1_TPC1_SM0_HWW_WARP_ESR_REPORT_MASK: 0x781eb60
[2025-06-11 05:06:04] ** 1492 printk messages dropped **
[2025-06-11 05:06:04] [37294.063611] ga10b channel status: in use idle not busy
[2025-06-11 05:06:04] ** 2559 printk messages dropped **
[2025-06-11 05:06:04] [37294.074885] ga10b NV_PGRAPH_PRI_FECS_NEW_CTX : 0x3014cf77
[2025-06-11 05:06:04] ** 1439 printk messages dropped **
[2025-06-11 05:06:04] [37294.086842] ga10b
[2025-06-11 05:06:04] [37294.086842] ga10b 242-ga10b, TSG: 61, pid 6084, thread name CPMMListener, refs: 2, deterministic: no, domain name: (no domain)
[2025-06-11 05:06:04] [37294.086843] ga10b channel status: in use idle not busy
[2025-06-11 05:06:04] [37294.086844] ga10b RAMFC: TOP: 8000002004300050 PUT: 002004300050 GET: 002004300050 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004030000 payload 0000000000000000 execute 00100001
[2025-06-11 05:06:04] [37294.086845] ga10b
[2025-06-11 05:06:04] [37294.086846] ga10b 243-ga10b, TSG: 60, pid 6083, thread name CPMMListener, refs: 2, deterministic: no, domain name: (no domain)
[2025-06-11 05:06:04] [37294.086847] ga10b channel status: in use idle not busy
[2025-06-11 05:06:04] [37294.086848] ga10b RAMFC: TOP: 8000002004300050 PUT: 002004300050 GET: 002004300050 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004030000 payload 0000000000000000 execute 00100001
[2025-06-11 05:06:04] [37294.086849] ga10b
[2025-06-11 05:06:04] [37294.086850] ga10b 244-ga10b, TSG: 59, pid 6082, thread name CPMMListener, refs: 2, deterministic: no, domain name: (no domain)
[2025-06-11 05:06:04] ** 1165 printk messages dropped **

Test log attachment
gpu-error.log (589.0 KB)

This is a log we haven’t seen during our development process. Where does this error message come from?

[2025-06-11 05:06:06] [37294.093167] nvgpu: 17000000.gpu nvgpu_set_err_notifier_locked:143 [ERR] error notifier set to 8 for ch 379 owned by gst-launch-1.0
[2025-06-11 05:06:06] [37294.095585] ga10b Channel Status - chip ga10b
[2025-06-11 05:06:06] [37294.095589] ga10b ---------------------------
[2025-06-11 05:06:06] [37294.095591] ga10b 236-ga10b, TSG: 67, pid 7634, thread name glmark2, refs: 530, deterministic: no, domain name: (no domain)
[2025-06-11 05:06:06] [37294.095592] ga10b channel status: in use idle not busy
[2025-06-11 05:06:06] [37294.095594] ga10b RAMFC: TOP: 8000002004008ca8 PUT: 002004008ca8 GET: 002004008ca8 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004010000 payload 0000000000000000 execute 00100001
[2025-06-11 05:06:06] [37294.095595] ga10b
[2025-06-11 05:06:06] [37294.095596] ga10b 237-ga10b, TSG: 66, pid 6087, thread name CPMMListener, refs: 2, deterministic: no, domain name: (no domain)
[2025-06-11 05:06:06] [37294.095597] ga10b channel status: in use idle not busy
[2025-06-11 05:06:06] [37294.095598] ga10b RAMFC: TOP: 8000002004300050 PUT: 002004300050 GET: 002004300050 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004030000 payload 0000000000000000 execute 00100001
[2025-06-11 05:06:06] [37294.095599] ga10b
[2025-06-11 05:06:06] [37294.095600] ga10b 238-ga10b, TSG: 65, pid 6088, thread name CPMMListener, refs: 2, deterministic: no, domain name: (no domain)
[2025-06-11 05:06:06] [37294.095602] ga10b channel status: in use idle not busy
[2025-06-11 05:06:06] [37294.095603] ga10b RAMFC: TOP: 8000002004300050 PUT: 002004300050 GET: 002004300050 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004030000 payload 0000000000000000 execute 00100001
[2025-06-11 05:06:06] [37294.095604] ga10b
[2025-06-11 05:06:06] [37294.095605] ga10b 239-ga10b, TSG: 64, pid 6089, thread name CPMMListener, refs: 2, deterministic: no, domain name: (no domain)
[2025-06-11 05:06:06] [37294.095606] ga10b channel status: in use idle not busy
[2025-06-11 05:06:06] [37294.095607] ga10b RAMFC: TOP: 8000002004300050 PUT: 002004300050 GET: 002004300050 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004030000 payload 0000000000000000 execute 00100001
[2025-06-11 05:06:06] [37294.095608] ga10b
[2025-06-11 05:06:06] [37294.095609] ga10b 240-ga10b, TSG: 63, pid 6086, thread name CPMMListener, refs: 2, deterministic: no, domain name: (no domain)
[2025-06-11 05:06:06] [37294.095609] ga10b channel status: in use idle not busy
[2025-06-11 05:06:06] [37294.095611] ga10b RAMFC: TOP: 8000002004300050 PUT: 002004300050 GET: 002004300050 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004030000 payload 0000000000000000 execute 00100001
[2025-06-11 05:06:06] [37294.095612] ga10b
[2025-06-11 05:06:06] [37294.095613] ga10b 241-ga10b, TSG: 62, pid 6085, thread name CPMMListener, refs: 2, deterministic: no, domain name: (no domain)
[2025-06-11 05:06:06] [37294.095614] ga10b channel status: in use idle not busy
[2025-06-11 05:06:06] [37294.095615] ga10b RAMFC: TOP: 8000002004300050 PUT: 002004300050 GET: 002004300050 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004030000 payload 0000000000000000 execute 00100001
[2025-06-11 05:06:06] [37294.095616] ga10b
[2025-06-11 05:06:06] [37294.095617] ga10b 242-ga10b, TSG: 61, pid 6084, thread name CPMMListener, refs: 2, deterministic: no, domain name: (no domain)
[2025-06-11 05:06:06] [37294.095618] ga10b channel status: in use idle not busy
[2025-06-11 05:06:06] [37294.095619] ga10b RAMFC: TOP: 8000002004300050 PUT: 002004300050 GET: 002004300050 FETCH: 000000000000 HEADER: 2140006c COUNT: 00000000 SEMAPHORE: addr 002004030000 payload 0000000000000000 execute 00100001

Hi,
Please apply this patch and see if it helps:

Jetson/L4T/r36.4.x patches - eLinux.org

Als the latest release is Jetpack 6.2. Would suggest upgrade to latest version.

We found the following topic in the forum. It’s exactly the same as our log, but the other party is using version R36.4 while we are on R36.3. Could you provide us with the DCE firmware for R36.3?

I noticed that in its revision, the key-suspend button was removed. Could you explain why this was done?

/delete-node/ &{/gpio-keys/key-suspend};

Relevant forum links:

Hi,
The issue is specific to Jetpack 6.1. On your system, do you observe it when the system is in suspending and the error is present while resuming the system? Do yo observe the same error on developer kit?

We are using version 6.0 of Jetpack. During the system recovery process, we did not encounter this error. However, when conducting the overall system stress test, we discovered this issue. Additionally, during the wet heat test on the hardware, we also identified similar problems. I will upload the relevant logs here and kindly ask you to assist in analyzing them. Thank you very much.

You can find more abnormal information in the log attachments by using the time range shown in my screenshots or by entering the search keywords.

kern.log (8.7 MB)

Hi,
Please try to reproduce it on developer kit. If the issue is also present on developer kit, please share us the steps. We will set up and check.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.