We reproduces this bug using the same steps on Xavier AGX on JetPack 4.6 and in addition, we found that NULL pointer dereference leads to immediate boot switching to other slot (after device rebooting by means of TEGRA_BCCPLEX_WATCHDOG) without any attempts to boot from current slot.
Could you clarify why NULL pointer dereference breaks A/B boot logic?
I have already applied the patch and it helped, thanks. However main point of my question is connected with breaking A/B boot logic due to NULL pointer dereference. Why retry_count became immediately equal zero? Do every null pointer dereference in kernel may lead to this behaviour? Please check the beginning and the end of attached log above in particular the output of nvbootctrl dump-slots-info. We afraid that similar errors may make device completely unbootable by switching to recovery mode.
may I know which JetPack release you’re using,
may I also what’s your steps to make the partition as NULL, please also gather the complete bootloader logs for reference,
thanks
We use Jetpack both 4.5 and 4.6. The log attached above is from JP 4.6. On JP 4.6 L4T 32.6.1 we reproduced this behaviour by running unplugged camera without applied patch:
Unplug the camera
Check nvbootctrl dump-slots-info
try to launch the unplugged camera by means of this command: sudo v4l2-ctl -d /dev/video0 --stream-mmap.
repeate step 3 until system hang. Wait rebooting by means of TEGRA_BCCPLEX_WATCHDOG.
Check boot log and nvbootctrl dump-slots-info
Sometimes we get the same issue (booting from other slot) on our devices on JP 4.5 L4T 32.5.1 and one case happened today. Please find attached logs below with bootloader and system logs. UART.txt (334.3 KB) Journal.txt (4.8 KB)
may I know what’s the actual use-case to launch the video stream without physical camera device connected?
it’s not suggest to remove the camera device while power-on from the 2x60 pin camera connector, you might damage the device.
We have all the cameras physically connected. However sometimes connection of one camera may be broken during operation for some reasons (bad wired connection, plug issues etc.). In my opinion it must not lead to whole system crash and also to breaking current active slot.
you’re working on the latest JetPack release, r32.6.1, right?
if yes, it’s include the error recovery mechanism, you should restart the camera service while the camera function is problematic. the worst case is rebooting the whole system.
hence,
please have a try to restart the nvargus-daemon as following, $ sudo pkill nvargus-daemon $ sudo nvargus-daemon &
Hello JerryChang
We still have complete system hangs and reboots by means of TEGRA_BCCPLEX_WATCHDOG to the other slot on JP4.6 L4T 32.6.1 after applying the patch.
We can not restart any camera services, system is completely unresponsive.
Attached one more log: log.txt (549.1 KB)
according to the logs, are you working with D3 Engineering?
since they’re one of Jetson Camera Partners, please contact with them directly to obtain the camera solution supports.
thanks
Hello JerryChang,
We do not work with D3 Engineering. We are going to move from Jetpack 4.5 to Jetpack 4.6 using the same camera drivers. Jetpack 4.6 was deployed to several devices from our fleet. Then every device started experience such issue after while. This issue reproduces once at about 6-10 hours of operation. Those devices have not experienced such issue on JP 4.5.
We suspect that happens due to changes in camera-rtcpu-rce.img
firmware and/or in VI5.
Could you help to debug? Which additional info should we provide?
please share your actual camera use-case, for example, what’s the user-space application you’re running to reproduce the issue, and, was there any background service occupied?
had you apply any pre-built updates? or, you’re using native JetPack-4.6 release image with your customize kernel to enable the camera sensor.