Kernel warning "jump label: negative count"

With a freshly flash pegasus, the following kernel error is output multiple times upon starting the nvsipl_camera progam:

[  378.941565] Call trace:
[  378.941571] [<ffffff80081dfc38>] __static_key_slow_dec_cpuslocked+0xa8/0xb0
[  378.941574] [<ffffff80081dfc78>] static_key_slow_dec+0x38/0x68
[  378.941582] [<ffffff8008615b7c>] nvmap_handle_get_cacheability+0x4c/0xc8
[  378.941588] [<ffffff80086034f4>] __nvmap_do_cache_maint+0x184/0x928
[  378.941592] [<ffffff8008603da8>] __nvmap_cache_maint+0x110/0x128
[  378.941596] [<ffffff800860f858>] nvmap_ioctl_cache_maint+0xa0/0x188
[  378.941600] [<ffffff8008608a80>] nvmap_ioctl+0x348/0x608
[  378.941606] [<ffffff800829ce14>] do_vfs_ioctl+0xc4/0xb68
[  378.941611] [<ffffff800829d944>] SyS_ioctl+0x8c/0xa8
[  378.941616] [<ffffff8008084180>] el0_svc_naked+0x34/0x38
[  378.941618] ---[ end trace 0000000000000002 ]---
[  378.941659] jump label: negative count!

Sometimes the system works fine despite this error, but other times other errors arise as well. Is this a fatal error that could be the source of errors later? How can prevent this error from happening?

Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.4.1.7402
other

Host Machine Version
native Ubuntu 18.04
other

Hi @a.dobke ,

Can you reproduce this issue on other systems? Please provide the complete nvsipl_camera comand and its output for us. Thanks.

Yes an identical problems occurs on multiple systems.

Example of nvsipl command:
nvsipl_camera --disableISP1Output --filedump-prefix out --link-enable-masks "0x0011 0x0000 0x0000 0x0000" --platform-config "IMX490_RGGB_CPHY_x4" --runfor 2 --showfps

I have attached corresponding output from nvsipl_camera and the kernel log

dmesg.txt (69.3 KB) nvsipl.txt (3.8 KB)

IMX490 camera modules aren’t listed as supported on DRIVE OS 5.2.0 in DRIVE Ecosystem Hardware and Software Components page.

Please try with supported camrea modules. I cannot reproduce the issue on Sekonix SF3325 camrea module with below command. FYI.

~/drive-t186ref-linux/samples/nvmedia/nvsipl/test/camera/nvsipl_camera --disableISP1Output --filedump-prefix out --link-enable-masks “0x0011 0x0000 0x0000 0x0000” --platform-config “SF3325_DPHY_x2” --runfor 2 --showfps

I have rerun the steps but with the supported 390 instead and I got the same result:

390_dmesg.txt (69.4 KB)
390.txt (873 Bytes)

I can also reproduce it with Sekonix SF3325 camrea module and will discuss internally. Thanks.

As far as you know, does the warning cause any functional issue? Thanks.

I have seen rarer cases of this issue where this message gets spammed in the console continuously – all that printk output causes the machine to slow down considerably (to the point were it doesn’t function).

There have been various other glitches (i.e. camera startup failures) that occurred and this message was present, but I am unsure if that was just a coincidence.

I collected further statistics for the occurrence of this issue. Over 50 system reboots, the system lock up when running the nvsipl_camera occurred 9 times! Once the system is in this state, any time the program is run results in this issue.

During this time, it was not possible to interact with the system such as by logging in via the serial console.

Same issue.
When running sipl camera driver, kern.log and syslog are spammed with call trace until disk is full.
So yes, the warning cause serious functional issue.

Hi @leonard.guo ,

Could you tell us how to reproduce it? On my side, after booting, I can only reproduce it at most once (see the warning messages once).

BTW, we are dicussing this internally and will update you guys here.

@VickNV
Are there any updates you can share? This is a blocker for us deploying drive 5.2.0

Hi @a.dobke ,

Once we have any progress, I’ll update you.
BTW, without “–filedump-prefix out” I couldn’t reproduce the issue. Please justify why this is a blocker for you. Thanks.

I have reproduced the issue without “–filedump-prefix out” as well. Infact my test that showed a 9 out of 50 failure rate was without this flag as well.

This is a blocker for us because we cannot send units in to the field with a 20% failure rate – this is far above an acceptable failure rate for our operations.

Could you help to share me the command you used (without “–filedump-prefix out”)? Thanks.

Yes, the command line is like the previous examples in this thread, just with that parameter removed:

nvsipl_camera --disableISP1Output --link-enable-masks "0x0011 0x0101 0x0101 0x0000" --platform-config "IMX390_RGGB_CPHY_x4" --runfor 3000 --showfps

The procedure I use to reproduce is to reboot, run the program, then check dmesg output for the flood of messages that comes along with this issue (or just a system lockup without the ability to check the log). This is how I came up with the 9/50 reproduction rate.

We’re debugging the issue and will update you if any finding. Thanks.

We have fixed the issue in the attached nvmap_init_t19x.c (14.3 KB) nvmap_cache_t19x.c (2.0 KB) .

Please replace nvmap_cache_t19x.c and nvmap_init_t19x.c in ~/nvidia/nvidia_sdk/DRIVE_OS_5.2.0_SDK_Linux_OS_DDPX/DRIVEOS/drive-oss-src/nvidia/drivers/video/tegra/nvmap with them, and follow Compiling the Kernel (NVIDIA DRIVE Linux) to rebuild the kernel. Thanks.

I have tested this change and 50 out of 50 reboots came up without the previously observed error. Thank you.
Should I expect this fix to be part of a official release sometime soon? Or will I need to maintain and spread this change to all developers.