Hi,
We user Jetson AGX Xavier. The l4T version is R35.6.0 (Jetson Linux 35.6.0 is the production quality release for JetPack 5. It includes Linux Kernel 5.10, an Ubuntu 20.04 based root file system, a UEFI based bootloader, and OP-TEE as Trusted Execution Environment.)
We killed the Gnome desktop environment and used /usr/bin/Xorg -ac -noreset -nolisten tcp & to start Xorg.
We wrote a Qt GUI program using Qt5OpenGL. We use xrandr to set the output of the HDMI before the program starts the Qt program.
Occasionally, the GPU reports the following error:
[ 925.169962] tegradc 15200000.display: unblank
[ 925.170042] tegradc 15210000.display: unblank
[ 925.170109] tegradc 15220000.display: unblank
[ 925.269677] tegradc 15200000.display: blank - powerdown
[ 925.335144] extcon-disp-state external-connection:disp-state: cable 48 state 0
[ 925.335150] Extcon AUX2(HDMI) disable
[ 925.357072] tegradc 15200000.display: unblank
[ 925.361481] tegradc 15200000.display: hdmi: tmds rate:297000K prod-setting:prod_c_hdmi_223m_300m
[ 925.363294] tegradc 15200000.display: hdmi: get YCC quant from EDID.
[ 925.366910] extcon-disp-state external-connection:disp-state: cable 48 state 1
[ 925.366914] Extcon AUX2(HDMI) enable
[ 925.367059] tegradc 15200000.display: unblank
[ 925.367171] tegradc 15210000.display: unblank
[ 925.367245] tegradc 15220000.display: unblank
[ 925.498744] tegradc 15200000.display: unblank
[ 925.498786] tegradc 15210000.display: blank - powerdown
[ 925.561847] extcon-disp-state external-connection:disp-state: cable 46 state 0
[ 925.561853] Extcon AUX0(HDMI) disable
[ 925.583712] tegradc 15210000.display: unblank
[ 925.591976] tegradc 15210000.display: hdmi: tmds rate:594000K prod-setting:prod_c_hdmi_300m_600m
[ 925.594107] tegradc 15210000.display: hdmi: get YCC quant from EDID.
[ 925.597697] extcon-disp-state external-connection:disp-state: cable 46 state 1
[ 925.597701] Extcon AUX0(HDMI) enable
[ 925.597816] tegradc 15210000.display: unblank
[ 925.598001] tegradc 15220000.display: unblank
[ 925.775429] nvgpu: 17000000.gv11b nvgpu_report_err_to_sdl:66 [ERR] Failed to report an error: hw_unit_id = 0x9, err_id=0x8, ss_err_id = 0x89
[ 925.775808] nvgpu: 17000000.gv11b gv11b_mm_mmu_fault_handle_buf_valid_entry:525 [ERR] page fault error: err_type = 0x8, fault_status = 0x200
[ 925.776120] nvgpu: 17000000.gv11b gv11b_fb_mmu_fault_info_dump:294 [ERR] [MMU FAULT] mmu engine id: 64, ch id: 509, fault addr: 0x1fc4000000, fault addr aperture: 0, fault type: invalid pde, access type: virt write,
[ 925.776864] nvgpu: 17000000.gv11b gv11b_fb_mmu_fault_info_dump:307 [ERR] [MMU FAULT] protected mode: 0, client type: gpc, client id: prop 0, gpc id if client type is gpc: 0,
[ 925.777245] nvgpu: 17000000.gv11b nvgpu_rc_mmu_fault:352 [ERR] mmu fault id=2 id_type=1 act_eng_bitmask=00000001
[ 925.777543] nvgpu: 17000000.gv11b nvgpu_tsg_set_ctx_mmu_error:648 [ERR] TSG 2 generated a mmu fault
[ 925.777765] nvgpu: 17000000.gv11b nvgpu_set_err_notifier_locked:149 [ERR] error notifier set to 31 for ch 509
[ 925.779538] __gv11b__ Channel Status - chip gv11b
[ 925.779543] __gv11b__ ---------------------------
[ 925.784491] __gv11b__ 507-gv11b, TSG: 3, pid 15393, refs: 2, deterministic: no, domain name: (default)
[ 925.788826] __gv11b__ channel status: in use idle not busy
[ 925.798088] __gv11b__ RAMFC: TOP: 8000002000414538 PUT: 002000414538 GET: 002000414538 FETCH: 002000414538 HEADER: 60400000 COUNT: 80000000 SEMAPHORE: addr 002000408000 payload 0000000000000000 execute 00100001
[ 925.803915] __gv11b__
[ 925.822703] __gv11b__ 508-gv11b, TSG: 4, pid 14943, refs: 2, deterministic: no, domain name: (default)
[ 925.825255] __gv11b__ channel status: in use idle not busy
[ 925.834613] __gv11b__ RAMFC: TOP: 000000000000 PUT: 000000000000 GET: 000000000000 FETCH: 000000000000 HEADER: 20400000 COUNT: 00000000 SEMAPHORE: addr 000000000000 payload 0000000000000000 execute 00000000
[ 925.840009] __gv11b__
[ 925.859138] __gv11b__ 509-gv11b, TSG: 2, pid 14943, refs: 6, deterministic: no, domain name: (default)
[ 925.861156] __gv11b__ channel status: in use on_pbdma_and_eng busy
[ 925.870649] __gv11b__ RAMFC: TOP: 8000001ff53ce420 PUT: 001ff53ce46c GET: 001ff53ce420 FETCH: 0c1ff53ce490 HEADER: 20060030 COUNT: 00110002 SEMAPHORE: addr 001ffe17f100 payload 00000000000040fb execute 02000001
[ 925.876977] __gv11b__
[ 925.895627] __gv11b__ 510-gv11b, TSG: 1, pid 736, refs: 2, deterministic: no, domain name: (default)
[ 925.898258] __gv11b__ channel status: in use idle not busy
[ 925.906899] __gv11b__ RAMFC: TOP: 8000002000429b40 PUT: 002000429b40 GET: 002000429b40 FETCH: 002000429b40 HEADER: 60400000 COUNT: 80000000 SEMAPHORE: addr 002000428000 payload 0000000000000000 execute 00000001
[ 925.912897] __gv11b__
[ 925.932642] __gv11b__ 511-gv11b, TSG: 0, pid 736, refs: 2, deterministic: no, domain name: (default)
[ 925.934023] __gv11b__ channel status: in use idle not busy
[ 925.943266] __gv11b__ RAMFC: TOP: 8000002000449df8 PUT: 002000449df8 GET: 002000449df8 FETCH: 002000449df8 HEADER: 60400000 COUNT: 80000000 SEMAPHORE: addr 002000420000 payload 0000000000000000 execute 00100001
[ 925.948725] __gv11b__
[ 925.967798] __gv11b__ PBDMA Status - chip gv11b
[ 925.970225] __gv11b__ -------------------------
[ 925.974948] __gv11b__ pbdma 0:
[ 925.979484] __gv11b__ id: 2 - [tsg] next_id: - -1 [channel] | status: valid
[ 925.982615] __gv11b__ PBDMA_PUT 0000001ff53ce46c PBDMA_GET 0000001ff53ce420
[ 925.990024] __gv11b__ GP_PUT 00000d16 GP_GET 00000d01 FETCH 00000d01 HEADER 20060030
[ 925.997634] __gv11b__ HDR 00000000 SHADOW0 00449dd0 SHADOW1 00002820
[ 926.005778] __gv11b__ pbdma 1:
[ 926.013149] __gv11b__ id: -1 - [channel] next_id: - -1 [channel] | status: invalid
[ 926.016189] __gv11b__ PBDMA_PUT 0000005292802020 PBDMA_GET 0000004892070220
[ 926.023931] __gv11b__ GP_PUT 00000000 GP_GET 10014200 FETCH 00000000 HEADER 20942800
[ 926.031319] __gv11b__ HDR 01000840 SHADOW0 00020000 SHADOW1 00000000
[ 926.039684] __gv11b__ pbdma 2:
[ 926.046675] __gv11b__ id: 4 - [tsg] next_id: - -1 [channel] | status: valid
[ 926.049743] __gv11b__ PBDMA_PUT 00000020005a0078 PBDMA_GET 00000020005a0078
[ 926.057606] __gv11b__ GP_PUT 0000000a GP_GET 0000000a FETCH 0000000a HEADER 60400000
[ 926.064758] __gv11b__ HDR 00000000 SHADOW0 005a0050 SHADOW1 00002820
[ 926.073115] __gv11b__
[ 926.080665] __gv11b__ gv11b eng 0:
[ 926.083165] __gv11b__ id: 2 (tsg), next_id: -1 (channel), ctx status: valid
[ 926.086861] __gv11b__ busy
[ 926.093631] __gv11b__
[ 926.096510] __gv11b__ gv11b eng 1:
[ 926.099209] __gv11b__ id: 4 (tsg), next_id: -1 (channel), ctx status: valid
[ 926.102587] __gv11b__
[ 926.110026] __gv11b__ gv11b eng 2:
[ 926.112554] __gv11b__ id: -1 (channel), next_id: -1 (channel), ctx status: invalid
[ 926.115927] __gv11b__
[ 926.123877] __gv11b__ gv11b eng 3:
[ 926.126280] __gv11b__ id: -1 (channel), next_id: -1 (channel), ctx status: invalid
[ 926.130156] __gv11b__
[ 926.137682] __gv11b__
A detailed printout is attached below:
0526_211_dmesg.log (134.9 KB)
What are the possible reasons for the above GPU error report? How should we go about troubleshooting?
Thank you very much!
