Gk20a_pmu_isr:712 [ERR] pmu halt intr not implemented

Hi.

I am checking the title error log on a custom board using TX2/4GB.
Below are the questions.

  1. What is “isr712” and what causes it?
  2. Is “gk20a” the identifier of the log output source program? If you know a specific program, please let me know.
  3. If 2 was “Yes”, there was also a log called “gp10b”. If you can identify the program for this as well, please let me know the program that generated the output.
  4. Is not implemented something that the JetPack standard does not support but should be implemented by the user?

that’s all

May I know how the error was presented?
Any application running?
Which JetPack release?

Hi. kayccc

thanks your reply.

May I know how the error was presented?
[93177.355290] nvgpu: 17000000.gp10b gk20a_pmu_isr:724 [ERR] pmu exterr intr not implemented. Clearing interrupt.
[2023-06-16 19:54:39.731] Jpeg Size =17792!!
Any application running?
Yes.
Which JetPack release?
R32.4.4

best regard.

Hi,

An error from gk20a/gp10b indicates it is from nvgpu driver. For such issue, please clarify if this is causing any fatal result. If it does not cause system crash or application crash, then you could ignore it.

It does not mean you have to implement it. It is also pointless to share more detail to you about this driver.

Hi. WayneWWW

Additional logs are below.
(Excluding the output log of our application)
It eventually restarts with the WatchDog timer.

[93177.355290] nvgpu: 17000000.gp10b gk20a_pmu_isr:724 [ERR] pmu exterr intr not implemented. Clearing interrupt.
[2023-06-16 19:54:39.731] Jpeg Size =17792!!
[93177.370372] arm-smmu 12000000.iommu: Unexpected {global,context} fault, this could be serious
[2023-06-16 19:54:39.731] .131787] ★★★DRIVER CAM=0 [93177.370376] arm-smmu 12000000.iommu: GFSR 0x00000002, GFSYNR0 0x00000000, GFSYNR1 0x0000087d, GFSYNR2 0x00000000
[93177.370403] mc-err: (255) csr_gpusrd: EMEM address decode error
[2023-06-16 19:54:39.747] ](1/2)
[93177.370405] mc-err: status = 0x20367058; addr = 0x3ffffffc0
[93177.370407] mc-err: secure: yes, access-type: read
[93177.370415] mc-err: unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[93177.370951] mc-err: unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[93177.370958] mc-err: unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[93177.370962] mc-err: Too many MC errors; throttling prints
[93177.404156] (0-0): ##(0-0)fm now==pre##
[93177.404264] (0-1): ##(0-1)fm now==pre##
[93177.482612] nvgpu: 17000000.gp10b ERROR: ARI request timed out: req 91 on CPU 4
[2023-06-16 19:54:41.841] ASSERT: plat/nvidia/tegra/soc/t186/drivers/mce/ari.c <127> : retries != 0U
[2023-06-16 19:56:14.879] [0000.050] C> I2C command failed <-reboot *****

Please at least upgrade your BSP. We don’t really debug on old bsp. Need to at least check rel-32.7.4.

And need to clarify if this could be reproduced on NV devkit.

Hi.

It seems possible to change the version of the BSP part of JetPack.
Is it the part that has been repaired in the BSP version?
Also, since the Pinmux settings are different from the evaluation board, evaluation on the evaluation board is not possible.
What we want to know is what is triggering this phenomenon.

best regard.

Hi. WayneWWW

Excuse me, let me check just in case.
Let me confirm what “at least the BSP part is up to date” assumes.

a) Is it OK to update only the parts other than the source/rootfs, such as nVIDIA-provided binaries (including bootloader up to cboot), configuration settings, scripts such as flash.sh?

In this case, the Linux Kernel, rootfs, etc. are all left as they are and there is no need to rebuild them.
(In other words, the binary provided by nVIDIA, the flash environment, configuration settings, etc. may be the cause).

b) Is the Linux Kernel and glibc in the rootfs and other libraries and tools (such as CUDA) updated as well?

The apps developed here do not need to be changed.
(Are there any suspicious parts in Kernel, Kernel built-in debadora, glibc, libraries in rootfs, command daemons, etc.?)

Which image do you think it will be?
Answer a) or b).

By the way, with L4T 32.4.4 ⇒ 32.7.4, at least the kernel version has been updated from 4.9.140 ⇒ 4.9.337.

Hi,

I mean the whole system should be upgraded. When system got upgraded, the SDK (CUDA/TRT) will be upgraded too.

It is not possible to tell whether this issue got fixed here or not. Maybe you would still hit issue here.
But this is kind of SOP.
For example, rel-32.1 supports TX2 too but it is a software released 4 years ago and we may already resolved lots of issue since then. It is not a good idea to fallback to old release to debug. Same to rel-32.4.4. 32.4.4 is already 3 years ago BSP.

If you cannot upgrade your system because it is custom board, then try to reproduce this issue on NV devkit + latest BSP.

Hi.

Are you recommending migration to JetPack rel-32.7.4 (including OS)?

To debug this issue, a upgrade is needed. But that is just for debug.

As I already said, If you cannot upgrade your system because it is custom board, then try to reproduce this issue on NV devkit + latest BSP. If it can reproduce on latest BSP, then we will try to fix this on latest BSP.

If it cannot reproduce on latest BSP, it probably means this issue had already be fixed.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.