Nvgpu_cic_mon_report_err_safety_services:92 [ERR] Error reporting is not supported in this platform

future.wang · July 2, 2024, 7:53am

Hi NV team，

When we use G-Streamer && CUDA to process related video data, the following error message may appear：

Jun 25 01:14:35 cntjitpy-igvak-9-dcu2 kernel: [ 2074.933337] nvgpu: 17000000.ga10b nvgpu_cic_mon_report_err_safety_services:92   [ERR]  Error reporting is not supported in this platform
Jun 25 01:14:35 cntjitpy-igvak-9-dcu2 kernel: [ 2074.934038] nvgpu: 17000000.ga10b nvgpu_cic_mon_report_err_safety_services:55   [ERR]  Error reporting is not supported in this platform
Jun 25 01:14:35 cntjitpy-igvak-9-dcu2 kernel: [ 2074.946288] nvgpu: 17000000.ga10b gv11b_mm_mmu_fault_handle_buf_valid_entry:525  [ERR]  page fault error: err_type = 0x8, fault_status = 0x200
Jun 25 01:14:35 cntjitpy-igvak-9-dcu2 kernel: [ 2074.946298] nvgpu: 17000000.ga10b      gv11b_fb_mmu_fault_info_dump:294  [ERR]  [MMU FAULT] mmu engine id:  65, ch id:  460, fault addr: 0x204b94000, fault addr aperture: 0, fault type: invalid pte, access type: virt atomic weak, 
Jun 25 01:14:35 cntjitpy-igvak-9-dcu2 kernel: [ 2074.946300] nvgpu: 17000000.ga10b      gv11b_fb_mmu_fault_info_dump:307  [ERR]  [MMU FAULT] protected mode: 0, client type: gpc, client id:  t1_5, gpc id if client type is gpc: 1, 
Jun 25 01:14:35 cntjitpy-igvak-9-dcu2 kernel: [ 2074.946308] nvgpu: 17000000.ga10b                nvgpu_rc_mmu_fault:352  [ERR]  mmu fault id=12 id_type=1 act_eng_bitmask=00000001
Jun 25 01:14:35 cntjitpy-igvak-9-dcu2 kernel: [ 2074.946340] nvgpu: 17000000.ga10b       nvgpu_tsg_set_ctx_mmu_error:648  [ERR]  TSG 12 generated a mmu fault
Jun 25 01:14:35 cntjitpy-igvak-9-dcu2 kernel: [ 2074.946349] nvgpu: 17000000.ga10b     nvgpu_set_err_notifier_locked:149  [ERR]  error notifier set to 31 for ch 460
Jun 25 01:14:35 cntjitpy-igvak-9-dcu2 kernel: [ 2074.959363] nvgpu: 17000000.ga10b nvgpu_gr_intr_handle_sm_exception:365  [ERR]  sm machine check err. gpc_id(1), tpc_id(2), offset(36864)
Jun 25 01:14:36 cntjitpy-igvak-9-dcu2 kernel: [ 2074.973076] __ga10b__ Channel Status - chip ga10b
Jun 25 01:14:36 cntjitpy-igvak-9-dcu2 kernel: [ 2075.057388] __ga10b__ ---------------------------
Jun 25 01:14:36 cntjitpy-igvak-9-dcu2 kernel: [ 2075.062317] __ga10b__ 429-ga10b, TSG: 17, pid 18984, refs: 2, deterministic: yes, domain name: (default)
Jun 25 01:14:36 cntjitpy-igvak-9-dcu2 kernel: [ 2075.067243] __ga10b__ channel status:  in use idle not busy
Jun 25 01:14:36 cntjitpy-igvak-9-dcu2 kernel: [ 2075.077187] __ga10b__ RAMFC: TOP: 000000000000 PUT: 000201384020 GET: 000201384020 FETCH: 000000000000 HEADER: 21540300 COUNT: 00000000 SEMAPHORE: addr 000000000000 payload 0000000000000000 execute 00000000
Jun 25 01:14:36 cntjitpy-igvak-9-dcu2 kernel: [ 2075.083017] __ga10b__  
Jun 25 01:14:36 cntjitpy-igvak-9-dcu2 kernel: [ 2075.102158] __ga10b__ 430-ga10b, TSG: 17, pid 18984, refs: 2, deterministic: yes, domain name: (default)
Jun 25 01:14:36 cntjitpy-igvak-9-dcu2 kernel: [ 2075.104711] __ga10b__ channel status:  in use idle not busy
Jun 25 01:14:36 cntjitpy-igvak-9-dcu2 kernel: [ 2075.114614] __ga10b__ RAMFC: TOP: 000000000000 PUT: 000201284020 GET: 000201284020 FETCH: 000000000000 HEADER: 21540300 COUNT: 00000000 SEMAPHORE: addr 000000000000 payload 0000000000000000 execute 00000000

Log files of two recurring issues：
Jun_25_01:14:35.txt (170.3 KB)
Jun_28_06:31:40.txt (290.3 KB)

Please analyze this issue.

thank you !

future.wang · July 2, 2024, 8:23am

Hi NV，

Additional information：
Orin paltform，R35.3.1

DaneLLL · July 2, 2024, 9:22am

Hi,
Please apply the setting for a try:
Jetson/L4T/TRT Customized Example - eLinux.org

If the issue is still there, please share the test app and steps. So that we can set up and try to replicate the issue on developer kit.

future.wang · July 29, 2024, 2:26am

Hi DaneLLL,

How can I set the default value of CUDA_DEVICE_MAX_CONNECTIONS to 32 in the following method without setting environment variables?

Long delays when submitting several cudaMemcpy

Please try to increase the computing channel

$ export CUDA_DEVICE_MAX_CONNECTIONS=32

A document can be found here:

DaneLLL · July 29, 2024, 5:03am

Hi,
This is the only aaproach. If you don’t want to have a global environment variable, you can configure it along with executing the CUDA application:

$ CUDA_DEVICE_MAX_CONNECTIONS=32 ./CUDA_application_binary

future.wang · July 29, 2024, 5:25am

Hi DaneLLL,

Can the same be achieved by setting some configuration parameters?

future.wang · July 30, 2024, 9:44am

Hi DaneLLL,

We executed $ export CUDA_DEVICE_MAX_CONNECTIONS=32,
and the problem still reappeared,
log:

kern_0725-1200.log (32.7 MB)

DaneLLL · July 31, 2024, 12:08pm

Hi,
We would need to reproduce it on Orin developer kit and check. Please try developer kit with Jetpack 6.0GA and see if the issue is present. If yes, please share the test sample and steps. We will set up developer kit to test.

system · August 14, 2024, 12:08pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
An nvgpu error causes all GPu-dependent services to fail Drivers - Linux, Windows, MacOS cuda , kernel , ubuntu	0	297	February 27, 2024
Video Imaging Compositor (VIC) performance differences between Xavier and Orin platforms Jetson AGX Xavier gstreamer	5	208	May 31, 2024
The GPU does not work DeepStream SDK kernel , python , deepstream	10	61	November 1, 2024
Gstreamer use nvdewarper frame lag and tear DeepStream SDK cudnn , jetson , deepstream	23	67	November 18, 2024
Jetson ORIN is not detecting my cuda instsallation Jetson AGX Orin cuda	5	2103	August 15, 2022
An nvgpu error causes all GPu-dependent services to fail Jetson AGX Xavier cuda	6	422	March 6, 2024
New installation Multiple Failues DeepStream SDK	18	1118	June 28, 2022
Regarding the issue of the GPU compute power test results being significantly lower than expected Jetson Orin Nano nvbugs , gpu-computing	1	39	August 23, 2024
Mc-err: (255) csr_vicsrd: EMEM address decode error Jetson AGX Orin camera , gstreamer	11	174	July 19, 2024
Kernel BUG at net/core/skbuff.c:1871! Jetson AGX Orin ethernet	21	874	March 14, 2024

Nvgpu_cic_mon_report_err_safety_services:92 [ERR] Error reporting is not supported in this platform

Long delays when submitting several cudaMemcpy

Related topics