Modprobe sometimes causes kernel panic

Hello!

We are trying to use STURDeCAM31 GMSL cameras with our jetson orin nx. These cameras use the isx031 driver. After a few minutes of use, these cameras tend to stop working, but it is not viable for us to reboot the whole system in order to restart the drivers. Therefore, we changed the device tree in order to have access to the reset pins of the cameras, and we are testing the following script to restart the cameras, which sometimes works:

#!/bin/bash

echo “Deactivating camera drivers…”

modprobe -r isx031_camera

echo “Resetting E-con ISX031 Cameras…”

# 1. Turn all camera power pins LOW (OFF)

gpioset gpiochip0 49=0

gpioset gpiochip0 138=0

gpioset gpiochip0 46=0

gpioset gpiochip0 139=0

echo “Cameras powered down. Waiting 5 second to drain capacitors…”

sleep 5

# 2. Turn all camera power pins HIGH (ON)

gpioset gpiochip0 49=1

gpioset gpiochip0 138=1

gpioset gpiochip0 46=1

gpioset gpiochip0 139=1

echo “Cameras powered up and I2C buses are ready! - Reactivating drivers…”

modprobe isx031_camera

echo “Cameras available!”

However, this has caused two kernel panics today. We managed to retrieve the kernel logs for the moment this happened:

May 20 15:54:50 ul20c3 embarque_ulcp_v3.2[263749]: Mode secu 1/255, Mode fonc 1/255, Mode cap 1/255 Position : 44149/2, Vitesse : 2816, Trajet vers : 13000/2, int>
May 20 15:54:51 ul20c3 embarque_ulcp_v3.2[263749]: Mode secu 1/255, Mode fonc 1/255, Mode cap 1/255 Position : 41172/2, Vitesse : 2732, Trajet vers : 13000/2, int>
May 20 15:54:51 ul20c3 kernel: SENSOR BOOT DATA mcu_isp_init
May 20 15:54:51 ul20c3 kernel: isx031 31-0043: Failed writing register ret = -121!
May 20 15:54:51 ul20c3 kernel: isx031 31-0043: mcu_isp_init_ISP(2358) Error - -121
May 20 15:54:51 ul20c3 kernel: isx031 31-0043: Unable to WRITE BOOTDATA TO SPI FLASH, retry = 4
May 20 15:54:51 ul20c3 kernel: SENSOR BOOT DATA mcu_isp_init
May 20 15:54:52 ul20c3 kernel: isx031 31-0043: Failed writing register ret = -121!
May 20 15:54:52 ul20c3 kernel: isx031 31-0043: mcu_get_cmd_status(4212) Error - -121
May 20 15:54:52 ul20c3 kernel: isx031 31-0043: mcu_isp_init_ISP(2367) Error
May 20 15:54:52 ul20c3 kernel: isx031 31-0043: Unable to WRITE BOOTDATA TO SPI FLASH, retry = 3
May 20 15:54:52 ul20c3 kernel: SENSOR BOOT DATA mcu_isp_init
May 20 15:54:52 ul20c3 kernel: isx031 31-0043: Failed writing register ret = -121!
May 20 15:54:52 ul20c3 kernel: isx031 31-0043: mcu_isp_init_ISP(2358) Error - -121
May 20 15:54:52 ul20c3 kernel: isx031 31-0043: Unable to WRITE BOOTDATA TO SPI FLASH, retry = 2
May 20 15:54:52 ul20c3 kernel: SENSOR BOOT DATA mcu_isp_init
May 20 15:54:52 ul20c3 embarque_ulcp_v3.2[263749]: Mode secu 1/255, Mode fonc 1/255, Mode cap 1/255 Position : 38520/2, Vitesse : 2695, Trajet vers : 13000/2, int>
May 20 15:54:52 ul20c3 kernel: isx031 31-0043: Failed writing register ret = -121!
May 20 15:54:52 ul20c3 kernel: isx031 31-0043: mcu_isp_init_ISP(2358) Error - -121
May 20 15:54:52 ul20c3 kernel: isx031 31-0043: Unable to WRITE BOOTDATA TO SPI FLASH, retry = 1
May 20 15:54:52 ul20c3 kernel: SENSOR BOOT DATA mcu_isp_init
May 20 15:54:53 ul20c3 kernel: isx031 31-0043: Failed writing register ret = -121!
May 20 15:54:53 ul20c3 kernel: isx031 31-0043: mcu_isp_init_ISP(2358) Error - -121
May 20 15:54:53 ul20c3 kernel: isx031 31-0043: Unable to WRITE BOOTDATA TO SPI FLASH, retry = 0
May 20 15:54:53 ul20c3 kernel: Power Off
May 20 15:54:53 ul20c3 kernel: isx031: probe of 31-0043 failed with error -5
May 20 15:54:53 ul20c3 kernel: isx031 31-0044: probing v4l2 sensor.
May 20 15:54:53 ul20c3 kernel: isx031 31-0044: Driver Version - 1.0
SPI Firmware Version - ISX031054FXXX01110f6eb90dXXXXXXX
MCU Firmware Version - ISX031GMSLXXX01110c7bf294XXXXXXX
May 20 15:54:53 ul20c3 kernel: isx031 31-0044: Unable to request PWM for trigger
May 20 15:54:53 ul20c3 kernel: Firmware name from the device tree is > cam_fw.bin
May 20 15:54:53 ul20c3 kernel: ISP Firmware name from the device tree is > ISX031_054F.bin
May 20 15:54:53 ul20c3 kernel: Current SIO ports is B
May 20 15:54:53 ul20c3 kernel: Default FrameRate in DT is 30
May 20 15:54:53 ul20c3 kernel: Priv->frame_time = 43
May 20 15:54:53 ul20c3 kernel: isx031 31-0044: supply vana not found, using dummy regulator
May 20 15:54:53 ul20c3 kernel: isx031 31-0044: supply vif not found, using dummy regulator
May 20 15:54:53 ul20c3 kernel: isx031 31-0044: I2C translate detected.. Skip i2c translate…
May 20 15:54:53 ul20c3 kernel: isx031 31-0044: SIOB Port I2C Reassignment successful
May 20 15:54:53 ul20c3 embarque_ulcp_v3.2[263749]: Mode secu 1/255, Mode fonc 1/255, Mode cap 1/255 Position : 35958/2, Vitesse : 2615, Trajet vers : 13000/2, int>
May 20 15:54:53 ul20c3 kernel: SIOB Port I2C translated successfully
May 20 15:54:53 ul20c3 kernel: BOOT-DATA FILE NAME = ISX031_054F.bin
May 20 15:54:53 ul20c3 kernel: Boot-Data Version Matched - (ISX031054FXXX01110f6eb90dXXXXXXX)
May 20 15:54:54 ul20c3 embarque_ulcp_v3.2[263749]: Mode secu 1/255, Mode fonc 1/255, Mode cap 1/255 Position : 33415/2, Vitesse : 2601, Trajet vers : 13000/2, int>
May 20 15:54:55 ul20c3 kernel: Current Firmware Version - (ISX031GMSLXXX01110c7bf294XXXXXXX)
May 20 15:54:55 ul20c3 kernel: isx031 31-0044: Fl Connect Status = 0x01
May 20 15:54:55 ul20c3 kernel: isx031 31-0044: Load Param Status = 0x01
May 20 15:54:55 ul20c3 kernel: isx031 31-0044: NV BOOT Status = 0x01
May 20 15:54:55 ul20c3 kernel: isx031 31-0044: SPI BOOTDATA IS VALID & SENSOR WILL BOOT FROM SPI
May 20 15:54:55 ul20c3 kernel: isx031 31-0044: Sensor ID = 0x1a00
May 20 15:54:55 ul20c3 kernel: mcu_isp_init
May 20 15:54:55 ul20c3 embarque_ulcp_v3.2[263749]: Mode secu 1/255, Mode fonc 1/255, Mode cap 1/255 Position : 30877/2, Vitesse : 2543, Trajet vers : 13000/2, int>
May 20 15:54:55 ul20c3 kernel: isx031 31-0044: ISP Initialized !!
May 20 15:54:55 ul20c3 kernel: isx031 31-0044: Fl Connect Status = 0x01
May 20 15:54:55 ul20c3 kernel: isx031 31-0044: Load Param Status = 0x01
May 20 15:54:55 ul20c3 kernel: isx031 31-0044: NV BOOT Status = 0x01
May 20 15:54:55 ul20c3 kernel: configuring SIOB serializer successful
May 20 15:54:55 ul20c3 videocross[267161]: video_share[/tmp/video/sens1] - impossible lire fichier de mémoire partagée: No such file or directory
May 20 15:54:55 ul20c3 kernel: configuring Deserializer Successful
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: Failed writing register ret = -121!
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: mcu_list_ctrls(3312) Error - -121
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: Failed writing register ret = -121!
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: mcu_list_ctrls(3301) Error - -121
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: Failed writing register ret = -121!
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: mcu_list_ctrls(3301) Error - -121
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: Failed reading register ret = -121!
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: mcu_list_ctrls(3319) Error - -121
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: Failed writing register ret = -121!
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: mcu_list_ctrls(3301) Error - -121
May 20 15:54:56 ul20c3 embarque_ulcp_v3.2[263749]: Mode secu 1/255, Mode fonc 1/255, Mode cap 1/255 Position : 28456/2, Vitesse : 2510, Trajet vers : 13000/2, int>
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: Failed reading register ret = -121!
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: mcu_list_ctrls(3319) Error - -121
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: Failed writing register ret = -121!
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: mcu_list_ctrls(3301) Error - -121
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: Failed writing register ret = -121!
May 20 15:54:56 ul20c3 kernel: isx031 31-0044: mcu_list_ctrls(3301) Error - -121
May 20 15:54:57 ul20c3 kernel: mcu_ctrl->ctrl_id : 00980900
May 20 15:54:57 ul20c3 kernel: 0. Initialized Control 0x00980900 - Brightness
May 20 15:54:57 ul20c3 kernel: mcu_ctrl->ctrl_id : 00980901
May 20 15:54:57 ul20c3 kernel: 1. Initialized Control 0x00980901 - Contrast
May 20 15:54:57 ul20c3 kernel: mcu_ctrl->ctrl_id : 00980902
May 20 15:54:57 ul20c3 kernel: 2. Initialized Control 0x00980902 - Saturation
May 20 15:54:57 ul20c3 kernel: mcu_ctrl->ctrl_id : 00980903
May 20 15:54:57 ul20c3 kernel: 3. Initialized Control 0x00980903 - Hue
May 20 15:54:57 ul20c3 kernel: mcu_ctrl->ctrl_id : 0098091b
May 20 15:54:57 ul20c3 kernel: 9. Initialized Control 0x0098091b - Sharpness
May 20 15:54:57 ul20c3 kernel: mcu_ctrl->ctrl_id : 009a0901
May 20 15:54:57 ul20c3 kernel: 10. Initialized Control Menu 0x009a0901 - Auto Exposure
May 20 15:54:57 ul20c3 kernel: mcu_ctrl->ctrl_id : 009a0902
May 20 15:54:57 ul20c3 kernel: 11. Initialized Control 0x009a0902 - Exposure Time, Absolute
May 20 15:54:57 ul20c3 videocross[267161]: video_share[/tmp/video/sens2] - impossible lire fichier de mémoire partagée: No such file or directory
May 20 15:54:57 ul20c3 kernel: mcu_ctrl->ctrl_id : 009a092b
May 20 15:54:57 ul20c3 kernel: 21. Initialized Custom Ctrl 0x009a092b - Frame Sync Mode
May 20 15:54:57 ul20c3 kernel: tegra-camrtc-capture-vi tegra-capture-vi: subdev isx031 31-0044 bound
May 20 15:54:57 ul20c3 kernel: isx031 31-0044: ser_status=23
May 20 15:54:57 ul20c3 kernel: isx031 31-0044: Detected ISX031 sensor
May 20 15:54:57 ul20c3 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000000004e0
May 20 15:54:57 ul20c3 kernel: Mem abort info:
May 20 15:54:57 ul20c3 kernel: ESR = 0x0000000096000004
May 20 15:54:57 ul20c3 kernel: EC = 0x25: DABT (current EL), IL = 32 bits
May 20 15:54:57 ul20c3 kernel: SET = 0, FnV = 0
May 20 15:54:57 ul20c3 kernel: EA = 0, S1PTW = 0
May 20 15:54:57 ul20c3 kernel: FSC = 0x04: level 0 translation fault
May 20 15:54:57 ul20c3 kernel: Data abort info:
May 20 15:54:57 ul20c3 kernel: ISV = 0, ISS = 0x00000004
May 20 15:54:57 ul20c3 kernel: CM = 0, WnR = 0
May 20 15:54:57 ul20c3 kernel: user pgtable: 4k pages, 48-bit VAs, pgdp=00000001593f1000
May 20 15:54:57 ul20c3 kernel: [00000000000004e0] pgd=0000000000000000, p4d=0000000000000000
May 20 15:54:57 ul20c3 kernel: Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
May 20 15:54:57 ul20c3 kernel: Modules linked in: isx031_camera(O) can_raw xt_conntrack nvidia_drm(O) nvidia_modeset(O) xt_MASQUERADE ip6table_nat ip6table_filter>
May 20 15:54:57 ul20c3 kernel: nvvrs_pseq_rtc(O) btusb btrtl btintel btbcm snd_hda_codec_hdmi mttcan(O) tegra_cactmon_mc_all(O) can_dev tegra234_aon(O) tegra_aco>
May 20 15:54:57 ul20c3 kernel: host1x_nvhost(O) tegra_wmark(O) tsecriscv(O) nvidia_p2p(O) nvhwpm(O) tegra_se(O) cec ina3221 crypto_engine drm_kms_helper nvgpu(O)>
May 20 15:54:57 ul20c3 kernel: CPU: 1 PID: 320635 Comm: sh Tainted: G W OE 5.15.148-tegra #1
May 20 15:54:57 ul20c3 kernel: Hardware name: NVIDIA AEC-6xxx - Jetson AGX Orin 32GB/Jetson, BIOS r36.4.0-fff0af86 12/30/2024
May 20 15:54:57 ul20c3 kernel: pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=–)
May 20 15:54:57 ul20c3 kernel: pc : __pi_memcmp+0x8/0x110
May 20 15:54:57 ul20c3 kernel: lr : strnstr+0x60/0xa0
May 20 15:54:57 ul20c3 kernel: sp : ffff80001955bbf0
May 20 15:54:57 ul20c3 kernel: x29: ffff80001955bbf0 x28: ffff000082a7be00 x27: 0000000000000000
May 20 15:54:57 ul20c3 kernel: x26: 0000000000000000 x25: 0000000000000000 x24: ffff000082a7c400
May 20 15:54:57 ul20c3 kernel: x23: ffff000085fb81f8 x22: ffffbdfd1b5dec50 x21: 0000000000000500
May 20 15:54:57 ul20c3 kernel: x20: 0000000000000008 x19: 00000000000004e0 x18: 0000000000000000
May 20 15:54:57 ul20c3 kernel: x17: 0000000000000000 x16: ffffbdfd25a38bf0 x15: 0000000000000000
May 20 15:54:57 ul20c3 kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
May 20 15:54:57 ul20c3 kernel: x11: 0000000000000000 x10: d1502eae2ba61915 x9 : d422a9eb845ec896
May 20 15:54:57 ul20c3 kernel: x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefefefefefeff
May 20 15:54:57 ul20c3 kernel: x5 : 8080808080808080 x4 : 0000000000000000 x3 : 0000000000000000
May 20 15:54:57 ul20c3 kernel: x2 : 0000000000000000 x1 : ffffbdfd1b5dec50 x0 : 00000000000004e0
May 20 15:54:57 ul20c3 kernel: Call trace:
May 20 15:54:57 ul20c3 kernel: __pi_memcmp+0x8/0x110
May 20 15:54:57 ul20c3 kernel: tegra_channel_close+0x8c/0x170 [tegra_camera]
May 20 15:54:57 ul20c3 kernel: v4l2_release+0xc8/0x100 [videodev]
May 20 15:54:57 ul20c3 kernel: __fput+0x7c/0x270
May 20 15:54:57 ul20c3 kernel: ____fput+0x28/0x40
May 20 15:54:57 ul20c3 kernel: task_work_run+0x90/0xf0
May 20 15:54:57 ul20c3 kernel: do_exit+0x32c/0xa30
May 20 15:54:57 ul20c3 kernel: do_group_exit+0x44/0xb0
May 20 15:54:57 ul20c3 kernel: __arm64_sys_exit_group+0x2c/0x30
May 20 15:54:57 ul20c3 kernel: invoke_syscall+0x5c/0x130
May 20 15:54:57 ul20c3 kernel: el0_svc_common.constprop.0+0x64/0x110
May 20 15:54:57 ul20c3 kernel: do_el0_svc+0x74/0xa0
May 20 15:54:57 ul20c3 kernel: el0_svc+0x28/0x80
May 20 15:54:57 ul20c3 kernel: el0t_64_sync_handler+0xa4/0x130
May 20 15:54:57 ul20c3 kernel: el0t_64_sync+0x1a4/0x1a8
May 20 15:54:57 ul20c3 kernel: Code: 00000000 00000000 f1002042 54000663 (f8408403)

Does anyone know if what we are doing has a chance of working consistently, or is it too risky? Is there a solution so doing this at least doesn’t crash the entire system?

*** Please note that this reply is generated by LLM automatically ***

The approach you’re taking to restart the cameras by resetting the power pins and reinitializing the drivers is not uncommon, but it’s not without risks. The fact that you’re experiencing kernel panics and crashes suggests that there are underlying issues that need to be addressed.

The kernel logs you provided indicate that the crashes are related to the isx031 driver and the tegracam framework. Specifically, the errors are occurring when the driver is trying to write to or read from the camera’s registers.

There are a few potential issues with your approach:

  1. Power sequencing: When you reset the power pins, you’re not guaranteed to get a clean shutdown of the camera. This can lead to unexpected behavior, including crashes.
  2. Driver synchronization: When you reinitialize the drivers, you’re not guaranteed that the drivers will be in a consistent state. This can lead to crashes or other unexpected behavior.
  3. Hardware limitations: The camera hardware may not be designed to handle the kind of power cycling and driver reinitialization you’re doing. This can lead to hardware failures or other issues.

To mitigate these risks, you could consider the following:

  1. Use a more robust power sequencing mechanism: Instead of simply toggling the power pins, consider using a more sophisticated power sequencing mechanism that ensures a clean shutdown of the camera.
  2. Improve driver synchronization: Consider adding synchronization mechanisms to the drivers to ensure that they’re in a consistent state when you reinitialize them.
  3. Implement error handling: Implement error handling mechanisms in the drivers to catch and handle any errors that may occur during the restart process.
  4. Test thoroughly: Test your approach thoroughly to ensure that it works consistently and doesn’t cause any unexpected behavior.

In terms of solutions, you could consider the following:

  1. Use a camera driver that supports hotplug: If the camera driver supports hotplug, you may be able to simply remove and reinsert the camera module to restart it.
  2. Use a more robust camera interface: Consider using a more robust camera interface, such as MIPI CSI-2, which is designed to handle the kind of power cycling and driver reinitialization you’re doing.
  3. Implement a watchdog: Implement a watchdog mechanism that can detect when the camera has crashed and restart it automatically.

In summary, while your approach may work, it’s not without risks. To mitigate these risks, you should consider implementing more robust power sequencing, driver synchronization, and error handling mechanisms. Additionally, you may want to consider using a more robust camera interface or implementing a watchdog mechanism to detect and restart the camera in case of a crash.

Here are some potential code changes you could make to improve the robustness of your approach:

  1. Add synchronization mechanisms to the drivers:
// Add a mutex to the driver to ensure that only one thread can access the camera at a time
static DEFINE_MUTEX(isx031_mutex);

// In the driver's probe function, lock the mutex before initializing the camera
int isx031_probe(struct platform_device *pdev)
{
    mutex_lock(&isx031_mutex);
    // Initialize the camera here
    mutex_unlock(&isx031_mutex);
    return 0;
}

// In the driver's remove function, lock the mutex before shutting down the camera
void isx031_remove(struct platform_device *pdev)
{
    mutex_lock(&isx031_mutex);
    // Shut down the camera here
    mutex_unlock(&isx031_mutex);
}
  1. Implement error handling mechanisms in the drivers:
// In the driver's probe function, check for errors when initializing the camera
int isx031_probe(struct platform_device *pdev)
{
    int ret;
    ret = initialize_camera();
    if (ret != 0) {
        printk(KERN_ERR "Failed to initialize camera\n");
        return ret;
    }
    return 0;
}

// In the driver's remove function, check for errors when shutting down the camera
void isx031_remove(struct platform_device *pdev)
{
    int ret;
    ret = shut_down_camera();
    if (ret != 0) {
        printk(KERN_ERR "Failed to shut down camera\n");
    }
}
  1. Implement a watchdog mechanism:
// Define a watchdog timer that will detect when the camera has crashed
static struct timer_list isx031_watchdog;

// In the driver's probe function, start the watchdog timer
int isx031_probe(struct platform_device *pdev)
{
    timer_setup(&isx031_watchdog, isx031_watchdog_handler, 0);
    mod_timer(&isx031_watchdog, jiffies + msecs_to_jiffies(1000));
    return 0;
}

// In the watchdog handler function, check if the camera has crashed and restart it if necessary
void isx031_watchdog_handler(struct timer_list *t)
{
    if (camera_has_crashed()) {
        printk(KERN_ERR "Camera has crashed, restarting...\n");
        restart_camera();
    }
    mod_timer(&isx031_watchdog, jiffies + msecs_to_jiffies(1000));
}

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Where do I find the source code for this driver? I was hoping for a solution that doesn’t need to edit the driver, if possible

What’s the BSP version?

cat /etc/nv_tegra_release

R36 (release), REVISION: 4.0, GCID: 37537400, BOARD: generic, EABI: aarch64, DATE: Fri Sep 13 04:36:44 UTC 2024

KERNEL_VARIANT: oot

TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

Please consult with the vendor to get the solution form the sensor driver.

Thanks

A comment which won’t solve this, but might be of use later: The kernel shows the error is in user space. You’ve used an i2c communication to talk to the device and that i2c communication succeeded. Later though, when it tries to use registers related to the device it fails to write to it. The device is alive, but not doing what is expected.

I do not know what “videocross[267161]” is, but it is at this point the attempt to write to a register fails. Perhaps something set up by videocross has failed, or maybe the isx031 firmware is just wrong. No way to tell, you’d need to be the manufacturer of the device to know what that means.

Perhaps you could try to unload and reload the kernel modules related to this prior to using it (not all modules are necessarily related, although the isx031_camera probably is):

May 20 15:54:57 ul20c3 kernel: Modules linked in: isx031_camera(O) can_raw xt_conntrack nvidia_drm(O) nvidia_modeset(O) xt_MASQUERADE ip6table_nat ip6table_filter>

Hi,

Please write to " techsupport@e-consystems.com",