Nvpmodel hang and Many ERROR in nvgpu

Dear Nvidia Forum
I am getting below issues.
new 73.txt (619.5 KB)

[   13.156999] nvgpu: 17000000.gv11b   __nvgpu_timeout_expired_msg_cpu:94   [ERR]  Timeout detected @ gr_gk20a_ctx_wait_ucode+0xa4/0x3a8 [nvgpu]
[   13.157303] nvgpu: 17000000.gv11b           gr_gk20a_ctx_wait_ucode:528  [ERR]  timeout waiting on mailbox=0 value=0x00000000
[   13.157496] nvgpu: 17000000.gv11b      gk20a_fecs_dump_falcon_stats:129  [ERR]  gr_fecs_os_r : 0
[   13.157651] nvgpu: 17000000.gv11b      gk20a_fecs_dump_falcon_stats:131  [ERR]  gr_fecs_cpuctl_r : 0x60
[   13.157809] nvgpu: 17000000.gv11b      gk20a_fecs_dump_falcon_stats:133  [ERR]  gr_fecs_idlestate_r : 0x0

I

NFO: task nvpmodel:4601 blocked for more than 120 seconds.
[  243.039461]       Not tainted 4.9.140-tegra #1
[  243.039836] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.060947] Kernel panic - not syncing: hung_task: blocked tasks
[  243.061226] CPU: 1 PID: 677 Comm: khungtaskd Not tainted 4.9.140-tegra #1
[  243.061387] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[  243.061544] Call trace:
[  243.061647] [<ffffff800808bdb8>] dump_backtrace+0x0/0x198
[  243.061782] [<ffffff800808c37c>] show_stack+0x24/0x30
[  243.061918] [<ffffff800845c7a0>] dump_stack+0x98/0xc0
[  243.062054] [<ffffff80081c1438>] panic+0x11c/0x298
[  243.062185] [<ffffff8008181240>] watchdog+0x300/0x3b8
[  243.062320] [<ffffff80080dbe64>] kthread+0xec/0xf0
[  243.062450] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  243.062594] SMP: stopping secondary CPUs

new 73.txt (619.5 KB)

BSP version is 32.4.4

Any applicatiom running on device?
Custom carrier board or with devkit?
Reproduce steps?
Reproduce rate?
The R32.4.4 is quite old, could you try with newer version?

Hi @kayccc Thank you for helping.

actually, initially we were facing below error in flashing new jetson device . (301)

Error in Flashing MemBct - Jetson & Embedded Systems / Jetson Xavier NX - NVIDIA Developer Forums

After that we moved from 32.4.3 to 32.4.4 … and flashing issue solved
please find below observation

  1. we have working jetson 300 which works with 32.4.3 but 32.4.4 its not working same nvpmodel error is coming
  2. we are using nvpmodel binary to change the clock of nvgpu .
  3. I compared the driver nvgpu its same in both build .
  4. issue is 100% reproducible
  5. since 32.4.3 is working may be some new dependency

I am thinking may be there is some dependency between new bsp and the nvpmodel package . please help

Could you test with some newer BSP to see if this got resolved? For example, rel-32.5.1 or rel-32.6.1?

Hi @WayneWWW Thank you for replying …
Looks like problem is fixed with trying with old kernel (32.4.3 + our commit)

before that I wanted to share the things how we are building the BSP.

  1. download the source
    wget https://developer.nvidia.com/embedded/L4T/r32_Release_v4.4/r32_Release_v4.4-GMC3/T186/Tegra186_Linux_R32.4.4_aarch64.tbz2.
  2. download the c-boot
    wget https://developer.nvidia.com/cboot-sources-jetson-agx-xavier-and-jetson-xavier-nx-32.4.4
  3. extract the cboot and keep in dev_folder/c-boot.
  4. extract the Tegra186_Linux_R32.4.4_aarch64.tbz2 and keep inside dev_folder/bsp_folder.
  5. dev_folder/kernel_src having the kernel with BSP version + our board specific commit
  6. after building we copy the binary to bsp folder and use ./flash.sh script for flashing.

In the previous build I changed the kernel and Cboot (step 3 and step 5),on top of 32.4.4 and tried flashing so I got nvpmodel error.

Now i use old kernel only did not do step 5 , and not getting this error .

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.