AGX Xavier GPU error

Hi,
We user Jetson AGX Xavier. The l4T version is R35.6.0 (Jetson Linux 35.6.0 is the production quality release for JetPack 5. It includes Linux Kernel 5.10, an Ubuntu 20.04 based root file system, a UEFI based bootloader, and OP-TEE as Trusted Execution Environment.)
We killed the Gnome desktop environment and used /usr/bin/Xorg -ac -noreset -nolisten tcp & to start Xorg.
We wrote a Qt GUI program using Qt5OpenGL. We use xrandr to set the output of the HDMI before the program starts the Qt program.
Occasionally, the GPU reports the following error:

[  925.169962] tegradc 15200000.display: unblank
[  925.170042] tegradc 15210000.display: unblank
[  925.170109] tegradc 15220000.display: unblank
[  925.269677] tegradc 15200000.display: blank - powerdown
[  925.335144] extcon-disp-state external-connection:disp-state: cable 48 state 0
[  925.335150] Extcon AUX2(HDMI) disable
[  925.357072] tegradc 15200000.display: unblank
[  925.361481] tegradc 15200000.display: hdmi: tmds rate:297000K prod-setting:prod_c_hdmi_223m_300m
[  925.363294] tegradc 15200000.display: hdmi: get YCC quant from EDID.
[  925.366910] extcon-disp-state external-connection:disp-state: cable 48 state 1
[  925.366914] Extcon AUX2(HDMI) enable
[  925.367059] tegradc 15200000.display: unblank
[  925.367171] tegradc 15210000.display: unblank
[  925.367245] tegradc 15220000.display: unblank
[  925.498744] tegradc 15200000.display: unblank
[  925.498786] tegradc 15210000.display: blank - powerdown
[  925.561847] extcon-disp-state external-connection:disp-state: cable 46 state 0
[  925.561853] Extcon AUX0(HDMI) disable
[  925.583712] tegradc 15210000.display: unblank
[  925.591976] tegradc 15210000.display: hdmi: tmds rate:594000K prod-setting:prod_c_hdmi_300m_600m
[  925.594107] tegradc 15210000.display: hdmi: get YCC quant from EDID.
[  925.597697] extcon-disp-state external-connection:disp-state: cable 46 state 1
[  925.597701] Extcon AUX0(HDMI) enable
[  925.597816] tegradc 15210000.display: unblank
[  925.598001] tegradc 15220000.display: unblank
[  925.775429] nvgpu: 17000000.gv11b           nvgpu_report_err_to_sdl:66   [ERR]  Failed to report an error: hw_unit_id = 0x9, err_id=0x8, ss_err_id = 0x89
[  925.775808] nvgpu: 17000000.gv11b gv11b_mm_mmu_fault_handle_buf_valid_entry:525  [ERR]  page fault error: err_type = 0x8, fault_status = 0x200
[  925.776120] nvgpu: 17000000.gv11b      gv11b_fb_mmu_fault_info_dump:294  [ERR]  [MMU FAULT] mmu engine id:  64, ch id:  509, fault addr: 0x1fc4000000, fault addr aperture: 0, fault type: invalid pde, access type: virt write, 
[  925.776864] nvgpu: 17000000.gv11b      gv11b_fb_mmu_fault_info_dump:307  [ERR]  [MMU FAULT] protected mode: 0, client type: gpc, client id:  prop 0, gpc id if client type is gpc: 0, 
[  925.777245] nvgpu: 17000000.gv11b                nvgpu_rc_mmu_fault:352  [ERR]  mmu fault id=2 id_type=1 act_eng_bitmask=00000001
[  925.777543] nvgpu: 17000000.gv11b       nvgpu_tsg_set_ctx_mmu_error:648  [ERR]  TSG 2 generated a mmu fault
[  925.777765] nvgpu: 17000000.gv11b     nvgpu_set_err_notifier_locked:149  [ERR]  error notifier set to 31 for ch 509
[  925.779538] __gv11b__ Channel Status - chip gv11b
[  925.779543] __gv11b__ ---------------------------
[  925.784491] __gv11b__ 507-gv11b, TSG: 3, pid 15393, refs: 2, deterministic: no, domain name: (default)
[  925.788826] __gv11b__ channel status:  in use idle not busy
[  925.798088] __gv11b__ RAMFC: TOP: 8000002000414538 PUT: 002000414538 GET: 002000414538 FETCH: 002000414538 HEADER: 60400000 COUNT: 80000000 SEMAPHORE: addr 002000408000 payload 0000000000000000 execute 00100001
[  925.803915] __gv11b__  
[  925.822703] __gv11b__ 508-gv11b, TSG: 4, pid 14943, refs: 2, deterministic: no, domain name: (default)
[  925.825255] __gv11b__ channel status:  in use idle not busy
[  925.834613] __gv11b__ RAMFC: TOP: 000000000000 PUT: 000000000000 GET: 000000000000 FETCH: 000000000000 HEADER: 20400000 COUNT: 00000000 SEMAPHORE: addr 000000000000 payload 0000000000000000 execute 00000000
[  925.840009] __gv11b__  
[  925.859138] __gv11b__ 509-gv11b, TSG: 2, pid 14943, refs: 6, deterministic: no, domain name: (default)
[  925.861156] __gv11b__ channel status:  in use on_pbdma_and_eng busy
[  925.870649] __gv11b__ RAMFC: TOP: 8000001ff53ce420 PUT: 001ff53ce46c GET: 001ff53ce420 FETCH: 0c1ff53ce490 HEADER: 20060030 COUNT: 00110002 SEMAPHORE: addr 001ffe17f100 payload 00000000000040fb execute 02000001
[  925.876977] __gv11b__  
[  925.895627] __gv11b__ 510-gv11b, TSG: 1, pid 736, refs: 2, deterministic: no, domain name: (default)
[  925.898258] __gv11b__ channel status:  in use idle not busy
[  925.906899] __gv11b__ RAMFC: TOP: 8000002000429b40 PUT: 002000429b40 GET: 002000429b40 FETCH: 002000429b40 HEADER: 60400000 COUNT: 80000000 SEMAPHORE: addr 002000428000 payload 0000000000000000 execute 00000001
[  925.912897] __gv11b__  
[  925.932642] __gv11b__ 511-gv11b, TSG: 0, pid 736, refs: 2, deterministic: no, domain name: (default)
[  925.934023] __gv11b__ channel status:  in use idle not busy
[  925.943266] __gv11b__ RAMFC: TOP: 8000002000449df8 PUT: 002000449df8 GET: 002000449df8 FETCH: 002000449df8 HEADER: 60400000 COUNT: 80000000 SEMAPHORE: addr 002000420000 payload 0000000000000000 execute 00100001
[  925.948725] __gv11b__  
[  925.967798] __gv11b__ PBDMA Status - chip gv11b
[  925.970225] __gv11b__ -------------------------
[  925.974948] __gv11b__ pbdma 0:
[  925.979484] __gv11b__   id: 2 - [tsg]     next_id: - -1 [channel] | status: valid
[  925.982615] __gv11b__   PBDMA_PUT 0000001ff53ce46c PBDMA_GET 0000001ff53ce420
[  925.990024] __gv11b__   GP_PUT    00000d16  GP_GET  00000d01  FETCH   00000d01 HEADER 20060030
[  925.997634] __gv11b__   HDR       00000000  SHADOW0 00449dd0  SHADOW1 00002820
[  926.005778] __gv11b__ pbdma 1:
[  926.013149] __gv11b__   id: -1 - [channel] next_id: - -1 [channel] | status: invalid
[  926.016189] __gv11b__   PBDMA_PUT 0000005292802020 PBDMA_GET 0000004892070220
[  926.023931] __gv11b__   GP_PUT    00000000  GP_GET  10014200  FETCH   00000000 HEADER 20942800
[  926.031319] __gv11b__   HDR       01000840  SHADOW0 00020000  SHADOW1 00000000
[  926.039684] __gv11b__ pbdma 2:
[  926.046675] __gv11b__   id: 4 - [tsg]     next_id: - -1 [channel] | status: valid
[  926.049743] __gv11b__   PBDMA_PUT 00000020005a0078 PBDMA_GET 00000020005a0078
[  926.057606] __gv11b__   GP_PUT    0000000a  GP_GET  0000000a  FETCH   0000000a HEADER 60400000
[  926.064758] __gv11b__   HDR       00000000  SHADOW0 005a0050  SHADOW1 00002820
[  926.073115] __gv11b__  
[  926.080665] __gv11b__ gv11b eng 0: 
[  926.083165] __gv11b__ id: 2 (tsg), next_id: -1 (channel), ctx status: valid 
[  926.086861] __gv11b__ busy 
[  926.093631] __gv11b__  
[  926.096510] __gv11b__ gv11b eng 1: 
[  926.099209] __gv11b__ id: 4 (tsg), next_id: -1 (channel), ctx status: valid 
[  926.102587] __gv11b__  
[  926.110026] __gv11b__ gv11b eng 2: 
[  926.112554] __gv11b__ id: -1 (channel), next_id: -1 (channel), ctx status: invalid 
[  926.115927] __gv11b__  
[  926.123877] __gv11b__ gv11b eng 3: 
[  926.126280] __gv11b__ id: -1 (channel), next_id: -1 (channel), ctx status: invalid 
[  926.130156] __gv11b__  
[  926.137682] __gv11b__  

A detailed printout is attached below:
0526_211_dmesg.log (134.9 KB)

What are the possible reasons for the above GPU error report? How should we go about troubleshooting?
Thank you very much!

The Qt GUI seems to be rendering abnormally when the GPU reports an error. Dragging a Qt option box results in a long drag shadow that doesn’t go away.

The xrandr command to set the screen output is as follows:

    xrandr --output HDMI-1 --mode "1920x1200"
    xrandr --output HDMI-0 --mode "1920x1200"
    xrandr --output HDMI-2 --auto --primary --output HDMI-1 --auto --right-of HDMI-2 

Hi,
Please try to reproduce it on AGX Xavier developer kit. If you can reproduce it on developer kit, please share us the steps. We will set up and check.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.