An nvgpu error causes all GPu-dependent services to fail

Hi
I use AGX Xavier jetpack5.1.1,The kernel generates an NVGPu-related error. Services that depend on the gpu cannot work properly。May I ask what is the reason and how to solve this problem, which has seriously affected our normal use。

nvgpu_error.txt (1.2 MB)

nvmap_alloc_handle: PID 26673: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.261225] nvgpu: 17000000.gv11b nvgpu_gr_intr_handle_sm_exception:390 [ERR] could not pre-process sm error!
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.261895] nvgpu: 17000000.gv11b gr_intr_handle_exception_interrupts:759 [ERR] set gr exception notifier
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.262326] nvgpu: 17000000.gv11b nvgpu_set_err_notifier_locked:149 [ERR] error notifier set to 13 for ch 501
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.262970] gv11b Channel Status - chip gv11b
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.262974] gv11b ---------------------------
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.263195] gv11b 496-gv11b, TSG: 7, pid 26673, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.263388] gv11b channel status: in use idle not busy
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.263758] gv11b RAMFC: TOP: 000000000000 PUT: 0002009703a4 GET: 0002009703a4 FETCH: 0202009703a4 HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 000000000000 payload 0000000000000000 execute 00000000
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.263983] gv11b
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.264734] gv11b 497-gv11b, TSG: 7, pid 26673, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.265139] gv11b channel status: in use idle not busy
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.266743] gv11b RAMFC: TOP: 000000000000 PUT: 000200874218 GET: 000200874218 FETCH: 020200874218 HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 000201187fb4 payload 000000000004e5a6 execute 00001003
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.267492] gv11b
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.279572] gv11b 498-gv11b, TSG: 6, pid 26673, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.282071] gv11b channel status: in use idle not busy
Feb 20 15:41:44 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.291500] gv11b RAMFC: TOP: 000000000000 PUT: 00020077acd4 GET: 00020077acd4 FETCH: 02020077acd4 HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 002000411000 payload 000000000014ee1a execute 00000001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.296871] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.315563] gv11b 499-gv11b, TSG: 6, pid 26673, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.318040] gv11b channel status: in use idle not busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.327421] gv11b RAMFC: TOP: 000000000000 PUT: 00020067b824 GET: 00020067b824 FETCH: 02020067b824 HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 00200040d000 payload 000000000014ee1d execute 00000001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.327911] tegra-capture-ivc bc00000.rtcpu:ivc-bus:ivccontrol@3: tegra_ivc_write: error -512
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.327920] tegra-capture-ivc bc00000.rtcpu:ivc-bus:ivccontrol@3: tegra_ivc_write: error -512
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.327927] tegra194-vi5 15c10000.vi: IVC control submit failed
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.327981] tegra-camrtc-capture-vi tegra-capture-vi: vi capture setup failed
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.327993] tegra-capture-ivc bc00000.rtcpu:ivc-bus:ivccontrol@3: tegra_ivc_write: error -512
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.327996] tegra194-vi5 15c10000.vi: IVC control submit failed
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.328000] tegra194-vi5 15c10000.vi: csi_stream_release: failed to disable nvcsi tpg on stream 4 virtual channel 1
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.333125] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.333137] gv11b 500-gv11b, TSG: 6, pid 26673, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.333828] tegra-capture-ivc bc00000.rtcpu:ivc-bus:ivccontrol@3: tegra_ivc_write: error -512
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.333834] tegra194-vi5 15c10000.vi: IVC control submit failed
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.333926] tegra-camrtc-capture-vi tegra-capture-vi: vi capture setup failed
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.333939] tegra-capture-ivc bc00000.rtcpu:ivc-bus:ivccontrol@3: tegra_ivc_write: error -512
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.333942] tegra194-vi5 15c10000.vi: IVC control submit failed
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.333946] tegra194-vi5 15c10000.vi: csi_stream_release: failed to disable nvcsi tpg on stream 5 virtual channel 1
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.351401] tegra194-vi5 15c10000.vi: IVC control submit failed
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.351510] tegra-camrtc-capture-vi tegra-capture-vi: vi capture setup failed
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.360101] gv11b channel status: in use on_pbdma_and_eng busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.368563] tegra-capture-ivc bc00000.rtcpu:ivc-bus:ivccontrol@3: tegra_ivc_write: error -512
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.374642] gv11b RAMFC: TOP: 000000000000 PUT: 00020055b5e4 GET: 00020055b5e4 FETCH: 02020055b5e4 HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 002000405000 payload 000000000014ee1c execute 00000001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.381739] tegra194-vi5 15c10000.vi: IVC control submit failed
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.381748] tegra194-vi5 15c10000.vi: csi_stream_release: failed to disable nvcsi tpg on stream 2 virtual channel 1
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.527996] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.528008] gv11b 501-gv11b, TSG: 6, pid 26673, refs: 4, deterministic: yes, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.530580] gv11b channel status: in use idle not busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.539946] gv11b RAMFC: TOP: 000000000000 PUT: 000200468180 GET: 000200468180 FETCH: 020200468180 HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 002000402000 payload 00000000001237e0 execute 00000001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.545586] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.564117] gv11b 502-gv11b, TSG: 5, pid 13240, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.566697] gv11b channel status: in use idle not busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.576210] gv11b RAMFC: TOP: 000000000000 PUT: 000200970020 GET: 000200970020 FETCH: 020200970020 HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 000000000000 payload 0000000000000000 execute 00000000
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.581746] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.600145] gv11b 503-gv11b, TSG: 5, pid 13240, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.602766] gv11b channel status: in use idle not busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.612164] gv11b RAMFC: TOP: 000000000000 PUT: 00020087441c GET: 00020087441c FETCH: 02020087441c HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 000201187f8c payload 000000000000234e execute 00001003
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.617815] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.636113] gv11b 504-gv11b, TSG: 4, pid 13240, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.638713] gv11b channel status: in use idle not busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.648058] gv11b RAMFC: TOP: 000000000000 PUT: 000200777aec GET: 000200777aec FETCH: 020200777aec HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 002000411000 payload 000000000004326b execute 00000001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.653773] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.672073] gv11b 505-gv11b, TSG: 4, pid 13240, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.674679] gv11b channel status: in use idle not busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.684018] gv11b RAMFC: TOP: 000000000000 PUT: 000200677cb4 GET: 000200677cb4 FETCH: 020200677cb4 HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 00200040d000 payload 0000000000042a42 execute 00000001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.689738] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.708040] gv11b 506-gv11b, TSG: 4, pid 13240, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.710670] gv11b channel status: in use pending busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.719939] gv11b RAMFC: TOP: 000000000000 PUT: 000200557a74 GET: 000200557a74 FETCH: 020200557a74 HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 002000405000 payload 00000000000029de execute 00000001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.725448] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.743793] gv11b 507-gv11b, TSG: 4, pid 13240, refs: 2, deterministic: yes, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.746338] gv11b channel status: in use pending busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.755652] gv11b RAMFC: TOP: 000000000000 PUT: 0002004ff238 GET: 0002004ff238 FETCH: 0202004ff238 HEADER: 60400000 COUNT: 84000000 SEMAPHORE: addr 000201187fa4 payload 000000000000235a execute 00000003
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.761134] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.779492] gv11b 508-gv11b, TSG: 3, pid 5917, refs: 2, deterministic: no, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.782031] gv11b channel status: in use idle not busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.791505] gv11b RAMFC: TOP: 80000020004100a0 PUT: 0020004100a0 GET: 0020004100a0 FETCH: 0020004100a0 HEADER: 60400000 COUNT: 80000000 SEMAPHORE: addr 002000408000 payload 0000000000000000 execute 00100001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.797262] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.815872] gv11b 509-gv11b, TSG: 2, pid 4397, refs: 2, deterministic: no, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.818425] gv11b channel status: in use idle not busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.827902] gv11b RAMFC: TOP: 8000002000410168 PUT: 002000410168 GET: 002000410168 FETCH: 002000410168 HEADER: 60400000 COUNT: 80000000 SEMAPHORE: addr 002000408000 payload 0000000000000000 execute 00100001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.833660] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.852277] gv11b 510-gv11b, TSG: 1, pid 2673, refs: 2, deterministic: no, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.854853] gv11b channel status: in use idle not busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.863976] gv11b RAMFC: TOP: 8000002000429208 PUT: 002000429208 GET: 002000429208 FETCH: 002000429208 HEADER: 60400000 COUNT: 80000000 SEMAPHORE: addr 002000428000 payload 0000000000000000 execute 00000001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.869727] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.888406] gv11b 511-gv11b, TSG: 0, pid 2673, refs: 2, deterministic: no, domain name: (default)
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.890968] gv11b channel status: in use idle not busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.900114] gv11b RAMFC: TOP: 800000200044aaa0 PUT: 00200044aaa0 GET: 00200044aaa0 FETCH: 00200044aaa0 HEADER: 60400000 COUNT: 80000000 SEMAPHORE: addr 002000420000 payload 0000000000000000 execute 00100001
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.905851] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.924554] gv11b PBDMA Status - chip gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.927110] gv11b -------------------------
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.931674] gv11b pbdma 0:
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.936226] gv11b id: -1 - [channel] next_id: - -1 [channel] | status: invalid
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.939385] gv11b PBDMA_PUT 00000020004100a0 PBDMA_GET 00000020004100a0
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.947288] gv11b GP_PUT 00000012 GP_GET 00000012 FETCH 00000012 HEADER 60400000
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.954514] gv11b HDR 00000000 SHADOW0 00a00484 SHADOW1 00140800
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.962997] gv11b pbdma 1:
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.970169] gv11b id: 4 - [tsg] next_id: - -1 [channel] | status: valid
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.973114] gv11b PBDMA_PUT 00000002004ff5e0 PBDMA_GET 00000002004ff260
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.980802] gv11b GP_PUT 00000224 GP_GET 00000224 FETCH 00000224 HEADER 201101b0
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.988108] gv11b HDR 2001206c SHADOW0 004ff238 SHADOW1 0003aa02
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 997.996524] gv11b pbdma 2:
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.003536] gv11b id: 5 - [tsg] next_id: - -1 [channel] | status: valid
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.006560] gv11b PBDMA_PUT 0000000200874494 PBDMA_GET 0000000200874494
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.014151] gv11b GP_PUT 000000bf GP_GET 000000bf FETCH 000000bf HEADER 60400000
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.021265] gv11b HDR 00000000 SHADOW0 00874458 SHADOW1 00003e02
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.029920] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.037186] gv11b gv11b eng 0:
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.039719] gv11b id: 6 (tsg), next_id: 4 (tsg), ctx status: switch
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.043461] gv11b busy
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.050153] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.053000] gv11b gv11b eng 1:
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.055500] gv11b id: 5 (tsg), next_id: -1 (channel), ctx status: valid
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.059148] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.066257] gv11b gv11b eng 2:
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.068824] gv11b id: -1 (channel), next_id: -1 (channel), ctx status: invalid
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.072439] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.080207] gv11b gv11b eng 3:
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.082774] gv11b id: -1 (channel), next_id: -1 (channel), ctx status: invalid
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.086438] gv11b ctx_reload
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.094143] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.097525] gv11b
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.105856] nvgpu: 17000000.gv11b nvgpu_report_err_to_sdl:66 [ERR] Failed to report an error: hw_unit_id = 0x9, err_id=0x8, ss_err_id = 0x289
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.116277] nvgpu: 17000000.gv11b gv11b_mm_mmu_fault_handle_buf_valid_entry:525 [ERR] page fault error: err_type = 0x8, fault_status = 0x200
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.128657] nvgpu: 17000000.gv11b gv11b_fb_mmu_fault_info_dump:294 [ERR] [MMU FAULT] mmu engine id: 65, ch id: 501, fault addr: 0x0, fault addr aperture: 0, fault type: invalid pde, access type: virt read,
Feb 20 15:41:45 mxvlkica-qthdaa-4-dcu1 kernel: [ 998.147588] nvgpu: 17000000.gv11b gv11b_fb_mmu_fault_info_dump:307 [ERR] [MMU FAULT] protected mode: 0, client type: gpc, client id: t1 1, gpc id if client type is gpc: 0,
Feb 20 15:41:58 mxvlkica-qthdaa-4-dcu1 kernel: [ 1010.637101] nvmap_alloc_handle: PID 71865: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
Feb 20 16:11:22 mxvlkica-qthdaa-4-dcu1 kernel: [ 2774.759891] nvgpu: 17000000.gv11b nvgpu_gr_intr_handle_sm_exception:390 [ERR] could not pre-process sm error!
Feb 20 16:11:22 mxvlkica-qthdaa-4-dcu1 kernel: [ 2774.760368] nvgpu: 17000000.gv11b gr_intr_handle_exception_interrupts:759 [ERR] set gr exception notifier
Feb 20 16:11:22 mxvlkica-qthdaa-4-dcu1 kernel: [ 2774.760791] nvgpu: 17000000.gv11b nvgpu_set_err_notifier_locked:149 [ERR] error notifier set to 13 for ch

Hi,

Would you mind sharing more info about the use case?
Is there any action or API that triggers the nvgpu error?

Thanks.

hi,AastaLLL
The methods and apis of the GPU we use are as follows
It should be noted that if there is a power failure and restart, there is no way to restore it,And the probability of occurrence is relatively low, it may take a week to reproduce the related problem

  1. NvBuffer2Raw is used to get RGBA images
  2. CudaHostAlloc cudaHostGetDevicePointer/cudaMemPrefetchAsync unified memory memory pool
  3. thrust::device_vector was used for data buffering
  4. CudaMalloc cudaMemset/relevant cudaMemcpy/cudaDeviceSynchronize error log should be cudaMemcpy error in the execution process

Hi,

[ERR] could not pre-process sm error!

Based on the above error, which GPU architecture do you compile with?
For AGX Xavier, it should be sm_72.

Thanks.

Hi,
Our same set of applications are running on both xavier and orin platforms, both using jetpack5.1.1 version, but only found on xavier platform, is this problem caused by hardware differences?

Hi,

Yes, it is possible.
Xavier GPU architecture is 72 and Orin is 87.
So if you compile the application with sm_87, it might hit an error on the Xavier device.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.