L4T 35.2.1 - Kernel 5.10 crash while rmmod

Hi ,

As I already mentioned in this forum : JP 5.1 Frame drop issues - #21 by JerryChang
I am seeing a kernel crash while removing my driver module (iwr6843isk) for the first time after boot. Later on, the crash prints are not seen (again if I load and unload the module). This is not observed in L4T 32.5 - Kernel 4.9. Any inputs would be helpful.

attached log:
[ +0.000435] ------------[ cut here ]------------
[ +0.000125] refcount_t: underflow; use-after-free.
[ +0.000077] WARNING: CPU: 3 PID: 7017 at lib/refcount.c:28 refcount_warn_saturate+0xec/0x140
[ +0.000201] Modules linked in: fuse(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) nfnetlink(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xt_addrtype(E) iptable_filter(E) br_netfilter(E) lzo_rle(E) lzo_compress(E) zram(E) overlay(E) bnep(E) snd_soc_tegra186_asrc(E) snd_soc_tegra186_arad(E) snd_soc_tegra210_ope(E) snd_soc_tegra210_mvc(E) snd_soc_tegra186_dspk(E) snd_soc_tegra210_iqc(E) snd_soc_tegra210_afc(E) snd_soc_tegra210_dmic(E) snd_soc_tegra210_adx(E) snd_soc_tegra210_mixer(E) snd_soc_tegra210_amx(E) snd_soc_tegra210_admaif(E) snd_soc_tegra210_i2s(E) snd_soc_tegra210_sfc(E) snd_soc_tegra_pcm(E) aes_ce_blk(E) crypto_simd(E) cryptd(E) input_leds(E) aes_ce_cipher(E) ghash_ce(E) sha2_ce(E) sha256_arm64(E) sha1_ce(E) rtk_btusb(E) btusb(E) btrtl(E) realtek(E) btbcm(E) btintel(E) snd_soc_spdif_tx(E) snd_soc_tegra_machine_driver(E) leds_gpio(E) snd_soc_tegra210_adsp(E) snd_soc_tegra_utils(E) snd_soc_simple_card_utils(E)
[ +0.000217] snd_soc_tegra210_ahub(E) nvadsp(E) rtl8822ce(E) iwr6843isk(-) tegra_bpmp_thermal(E) nv_imx219(E) max77620_thermal(E) snd_hda_codec_hdmi(E) userspace_alert(E) snd_hda_tegra(E) tegra210_adma(E) snd_hda_codec(E) snd_hda_core(E) cfg80211(E) spi_tegra114(E) loop(E) binfmt_misc(E) ina3221(E) pwm_fan(E) nvgpu(E) nvmap(E) ip_tables(E) x_tables(E) [last unloaded: mtd]
[ +0.000100] CPU: 3 PID: 7017 Comm: rmmod Tainted: G OE 5.10.104-tegra #5
[ +0.000005] Hardware name: Unknown NVIDIA Jetson Xavier NX Developer Kit/NVIDIA Jetson Xavier NX Developer Kit, BIOS 2.1-32413640 01/24/2023
[ +0.000007] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=–)
[ +0.000009] pc : refcount_warn_saturate+0xec/0x140
[ +0.000006] lr : refcount_warn_saturate+0xec/0x140
[ +0.000005] sp : ffff80001b1f3bf0
[ +0.000013] x29: ffff80001b1f3bf0 x28: ffff10c300166580
[ +0.000020] x27: 0000000000000000 x26: 0000000000000000
[ +0.000037] x25: 0000000000000000 x24: 0000000000000000
[ +0.000017] x23: 0000000000000000 x22: ffffcbcf020edf70
[ +0.000018] x21: ffff10c304d29400 x20: 0000000000000000
[ +0.000016] x19: ffff10c307dcc618 x18: 0000000000000010
[ +0.000037] x17: 0000000000000000 x16: ffffcbcf00a563a0
[ +0.000018] x15: ffff10c300166af0 x14: ffffffffffffffff
[ +0.000037] x13: ffff80009b1f3827 x12: ffff80001b1f382f
[ +0.000012] x11: 0000000000000000 x10: 0000000000000000
[ +0.000026] x9 : ffff80001b1f3bf0 x8 : 75203b776f6c6672
[ +0.000012] x7 : 65646e75203a745f x6 : c0000000ffffefff
[ +0.000011] x5 : ffff10c47fe0b958 x4 : ffffcbcf01dc7968
[ +0.000010] x3 : 0000000000000001 x2 : ffff10c47fe0b960
[ +0.000011] x1 : 0000000000000000 x0 : 0000000000000000
[ +0.000010] Call trace:
[ +0.000008] refcount_warn_saturate+0xec/0x140
[ +0.000009] kobject_put+0xfc/0x110
[ +0.000006] kset_unregister+0x34/0x50
[ +0.000009] class_unregister+0x34/0x40
[ +0.000006] class_destroy+0x30/0x40
[ +0.000013] iwr6843isk_remove+0x94/0xd0 [iwr6843isk]
[ +0.000013] i2c_device_remove+0x5c/0xe0
[ +0.000006] device_release_driver_internal+0x11c/0x200
[ +0.000006] driver_detach+0x5c/0xf0
[ +0.000007] bus_remove_driver+0x60/0xb0
[ +0.000006] driver_unregister+0x38/0x60
[ +0.000006] i2c_del_driver+0x34/0x40
[ +0.000007] iwr6843isk_i2c_driver_exit+0x18/0xfc8 [iwr6843isk]
[ +0.000009] __arm64_sys_delete_module+0x18c/0x260
[ +0.000009] el0_svc_common.constprop.0+0x80/0x1d0
[ +0.000007] do_el0_svc+0x38/0xb0
[ +0.000009] el0_svc+0x1c/0x30
[ +0.000006] el0_sync_handler+0xa8/0xb0
[ +0.000006] el0_sync+0x16c/0x180
[ +0.000006] —[ end trace 55843198b7e5f4b8 ]—
This issue is not seen in the kernel 4.9 (L4T 35.2).

Thanks,
Shreyas

hello shreyas.pa,

this looks like culprit.
could you please dig into the driver to examine the refcount_t while you’re removing the kernel module.

Hi Jerry,

I am not using refcount_t anywhere in my driver.

Thanks,
Shreyas

In your driver iwr6843isk, please check whether you have done driver_register() in your driver init
This may occur when kernel tries to driver_unregister() while it is not registered already (or) has already did driver_unregister() and trying to do driver_unregister() another time

Hi Sathish,

My driver doesn’t have init function. In my driver probe function I am trying to: tegracam_device_register() and tegracam_v4l2subdev_register(). And in my remove function, tegracam_v4l2subdev_unregister and tegracam_device_unregister.
I am not trying to unregister it anywhere else.

Thanks for your response.

Looks like this issue occurred during iwr6843isk_i2c_driver_exit which is called while rmmod

  1. Are you sure your probe has been called (i.e., is device tree entry added correctly ?) ?
  2. If probe is not called or due to some reason, the device_register in your probe is not executed but still device_unregister is getting called in exit, then it might create this issue

Yes, my driver is working fine. Able to capture the data.
My driver logs during modprobe:
[ +2.909220] --------------Start of Probe Function------------------
[ +0.000010] This is from iwr6843isk_probe 570
[ +0.000142] iwr6843isk 9-0003: probing v4l2 sensor at addr 0x3
[ +0.000162] This is from iwr6843isk_parse_dt 447
[ +0.000123] iwr6843isk 9-0003: mclk name not present, assume sensor driven externally
[ +0.000183] iwr6843isk 9-0003: tegracam sensor driver:iwr6843isk_v2.0.6
[ +0.000005] This is before i2c write in probe
[ +0.000601] Radar-1 Not Detected !!!
[ +0.000381] tegra-camrtc-capture-vi tegra-capture-vi: subdev iwr6843isk 9-0003 bound
[ +0.011180] Major = 507 Minor = 0
[ +0.000212] -------------End of Probe Function-----------------

[ +0.002193] --------------Start of Probe Function------------------
[ +0.000007] This is from iwr6843isk_probe 570
[ +0.000134] iwr6843isk 10-0003: probing v4l2 sensor at addr 0x3
[ +0.000116] This is from iwr6843isk_parse_dt 447
[ +0.000123] iwr6843isk 10-0003: mclk name not present, assume sensor driven externally
[ +0.000141] iwr6843isk 10-0003: tegracam sensor driver:iwr6843isk_v2.0.6
[ +0.000005] This is before i2c write in probe
[ +0.000567] Radar-2 Detected…
[ +0.000079] tegra-camrtc-capture-vi tegra-capture-vi: subdev iwr6843isk 10-0003 bound
[ +0.010366] Major = 507 Minor = 1
[ +0.001532] -------------End of Probe Function-----------------

The kernel panic logs (attached in my first message) will appear only when I rmmod my kernel driver module first time.
Second time I tried rmmod:
[ +6.041787] This is from iwr6843isk_remove 666
[ +0.002015] tegra-camrtc-capture-vi tegra-capture-vi: subdev iwr6843isk 10-0003 unbind
[ +0.000384] This is from iwr6843isk_remove 666
[ +0.000484] tegra-camrtc-capture-vi tegra-capture-vi: subdev iwr6843isk 9-0003 unbind

  1. Is the probe() called 2 times in your driver module ?
  2. When kernel panic during rmmod, atleast 1st iwr6843isk_remove is getting success and the 2nd is getting panic (or) the 1st iwr6843isk_remove itself is giving kernel panic ?

Hi Sathish,

[1] Probe will be called twice as I have 2 radar devices connected on i2c bus 9 and 10.
[2] Kernel panic appears when I rmmod first time. It still doesn’t print the below logs:

[ +6.041787] This is from iwr6843isk_remove 666
[ +0.002015] tegra-camrtc-capture-vi tegra-capture-vi: subdev iwr6843isk 10-0003 unbind
[ +0.000384] This is from iwr6843isk_remove 666
[ +0.000484] tegra-camrtc-capture-vi tegra-capture-vi: subdev iwr6843isk 9-0003 unbind,

However the module will be removed but with a panic message.