IMX Camera Error

mozturk · December 11, 2020, 1:30pm

Hello,

We have an IMX283 sensor. We use this sensor with TX2 modules over CSI ports with custom carrier board. Driver works without any problem, we can capture frames in all modes etc. but we are facing a problem on many reboots. There is a problem with driver initialization. After rebooting couple of times kernel module hangs at loading state.

To be more clear, when we run this:

insmod imx283.ko

This is what /proc/modules outputs:

root@4-4-1:~# cat /proc/modules | grep -i imx283
imx283 29513 1 - Loading 0xffffff8001329000 (O+)

This issue happens randomly. Most of the time we are able to use the camera, driver is ok. But after restarting TX2 couple times this happens and we can not reboot the TX2 and we can not capture frames.

I am attaching related dmesg output. What could be causing this? How can we debug it?

Thank you.

imx283_dmesg_err.txt (3.5 KB)

JerryChang · December 14, 2020, 2:49am

hello mozturk,

may I know which JetPack release you’re working with.
you may refer to the release tag for more details. i.e. $ cat /etc/nv_tegra_release

JerryChang · December 14, 2020, 5:48am

hello mozturk,

are you having scripts to insert the kernel module during boot-up sequence? why don’t you make it as kernel builti-in driver if you would like to have sensor driver be loaded at kernel initialization stage.
thanks

mozturk · December 14, 2020, 7:57am

Hello JerryChang,

I use Jetpack 4.4.1:

# R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t186ref, EABI: aarch64, DATE: Fri Oct 16 19:37:08 UTC 2020

Yes, I have an insmod line in my camera application which works as a service.
Actually I was using that way before. After realizing this problem I switched to build as module method instead of built-in module.

Thank you.

JerryChang · December 14, 2020, 8:14am

hello mozturk,

I doubt this is system level issue since this happens randomly.
couple of things you may have a try,

please put some delay before insert your camera module.
please configure the process priority level by nice, or renice commands.

thanks

mozturk · December 16, 2020, 10:28am

Hello JerryChang,

I put 60 second delay and set insmod process priority as:

nice -n -20 insmod imx283.ko

I reboot device every 2 minute. This time faced this issue after 334 reboots. It is more rare than before.

Since kernel module hangs at “Loading” state, system is not able to reboot. I tried :

shutdown -r
halt --reboot
poweroff --reboot
reboot -f

Therefore I can not leave this issue. It requires physical action to cut off and back on the power.

Any advice will be appreciated.
If there is a way to reboot the device in this state it will help us to work around this.

Thank you

mozturk · December 16, 2020, 10:31am

I also want to mention that I can not use “rmmod” even kernel module loaded and live it gives segmentation fault:

root@4-4-1:~# rmmod imx283
Segmentation fault

This is the module state after rmmod:

root@4-4-1:~# cat /proc/modules | grep -i imx
imx283 20305 -1 - Unloading 0xffffff8001531000 (O-)

Stuck at “Unloading” this time.
“rmmod” command also logs error to dmesg as:

Internal error: Accessing user space memory outside uaccess.h routines: 96000005 [#1] PREEMPT SMP

I attach full dmesg error. “accessing user space memory…” error is same on both issues.

Camera module is attached over CSI at all times.

imx283_rmmod_dmesg (2.9 KB)

JerryChang · December 17, 2020, 5:20am

hello mozturk,

are you able to issue other kernel commands while the issue happened,
for example, please gather kernel messages for reference, i.e. $ dmesg

mozturk · December 18, 2020, 10:29am

Hello again,

I guess I misunderstand but yes for example I can make another insmod call to load another kernel module. It loads up and gets into “Live” state. We can gather dmesg output. I attached samples above.

I dig into code and this is what I found out:

This error happens at

drivers/media/v4l2-core/v4l2-ctrls.c : (below HERE mark)

/* Call s_ctrl for all controls owned by the handler */
int v4l2_ctrl_handler_setup(struct v4l2_ctrl_handler *hdl)
{
	struct v4l2_ctrl *ctrl;
	int ret = 0;

	if (hdl == NULL)
		return 0;
	mutex_lock(hdl->lock);
	list_for_each_entry(ctrl, &hdl->ctrls, node)
		ctrl->done = false;

	list_for_each_entry(ctrl, &hdl->ctrls, node) {
		struct v4l2_ctrl *master = ctrl->cluster[0];
		int i;

		/* Skip if this control was already handled by a cluster. */
		/* Skip button controls and read-only controls. */
		if (ctrl->done || ctrl->type == V4L2_CTRL_TYPE_BUTTON ||
		    (ctrl->flags & V4L2_CTRL_FLAG_READ_ONLY))
			continue;

            //HERE------------------------
		for (i = 0; i < master->ncontrols; i++) {
			if (master->cluster[i]) {
				cur_to_new(master->cluster[i]);
				master->cluster[i]->is_new = 1;
				master->cluster[i]->done = true;
			}
		}
		ret = call_op(master, s_ctrl);
		if (ret)
			break;
	}
	mutex_unlock(hdl->lock);
	return ret;
}

master->ncontrols value returns a very large negative number such as -673772992. Probably it doesnt get initialized. But it does return 1 generally. I cant understand the randomness of this issue.

Do you have any suggestions to prevent this?