NULL pointer dereference leads to boot switching to other slot

Hello

We reproduces this bug using the same steps on Xavier AGX on JetPack 4.6 and in addition, we found that NULL pointer dereference leads to immediate boot switching to other slot (after device rebooting by means of TEGRA_BCCPLEX_WATCHDOG) without any attempts to boot from current slot.
Could you clarify why NULL pointer dereference breaks A/B boot logic?

log.txt (70.4 KB)

Hi,
There is a patch for the issue:
V4L2 timeout leads to NULL pointer dereference in kernel in jetpack 4.6 - #12 by JerryChang

Please apply it and give it a try.

Hi, DaneLLL

I have already applied the patch and it helped, thanks. However main point of my question is connected with breaking A/B boot logic due to NULL pointer dereference. Why retry_count became immediately equal zero? Do every null pointer dereference in kernel may lead to this behaviour? Please check the beginning and the end of attached log above in particular the output of nvbootctrl dump-slots-info. We afraid that similar errors may make device completely unbootable by switching to recovery mode.

hello nazaraa,

may I know which JetPack release you’re using,
may I also what’s your steps to make the partition as NULL, please also gather the complete bootloader logs for reference,
thanks

Hi JerryChang,

We use Jetpack both 4.5 and 4.6. The log attached above is from JP 4.6. On JP 4.6 L4T 32.6.1 we reproduced this behaviour by running unplugged camera without applied patch:

  1. Unplug the camera
  2. Check nvbootctrl dump-slots-info
  3. try to launch the unplugged camera by means of this command: sudo v4l2-ctl -d /dev/video0 --stream-mmap.
  4. repeate step 3 until system hang. Wait rebooting by means of TEGRA_BCCPLEX_WATCHDOG.
  5. Check boot log and nvbootctrl dump-slots-info

Sometimes we get the same issue (booting from other slot) on our devices on JP 4.5 L4T 32.5.1 and one case happened today. Please find attached logs below with bootloader and system logs.
UART.txt (334.3 KB)
Journal.txt (4.8 KB)

hello nazaraa

may I know what’s the actual use-case to launch the video stream without physical camera device connected?
it’s not suggest to remove the camera device while power-on from the 2x60 pin camera connector, you might damage the device.

We have all the cameras physically connected. However sometimes connection of one camera may be broken during operation for some reasons (bad wired connection, plug issues etc.). In my opinion it must not lead to whole system crash and also to breaking current active slot.

hello nazaraa

you’re working on the latest JetPack release, r32.6.1, right?
if yes, it’s include the error recovery mechanism, you should restart the camera service while the camera function is problematic. the worst case is rebooting the whole system.
hence,
please have a try to restart the nvargus-daemon as following,
$ sudo pkill nvargus-daemon
$ sudo nvargus-daemon &

Hello JerryChang
We still have complete system hangs and reboots by means of TEGRA_BCCPLEX_WATCHDOG to the other slot on JP4.6 L4T 32.6.1 after applying the patch.
We can not restart any camera services, system is completely unresponsive.
Attached one more log:
log.txt (549.1 KB)

Any updates from your side?

hello nazaraa

according to the logs, are you working with D3 Engineering?
since they’re one of Jetson Camera Partners, please contact with them directly to obtain the camera solution supports.
thanks

Hello JerryChang,
We do not work with D3 Engineering. We are going to move from Jetpack 4.5 to Jetpack 4.6 using the same camera drivers. Jetpack 4.6 was deployed to several devices from our fleet. Then every device started experience such issue after while. This issue reproduces once at about 6-10 hours of operation. Those devices have not experienced such issue on JP 4.5.
We suspect that happens due to changes in camera-rtcpu-rce.img
firmware and/or in VI5.
Could you help to debug? Which additional info should we provide?

hello nazaraa,

please share your actual camera use-case, for example, what’s the user-space application you’re running to reproduce the issue, and, was there any background service occupied?
had you apply any pre-built updates? or, you’re using native JetPack-4.6 release image with your customize kernel to enable the camera sensor.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.