Jetson Nano Production Module - SD Card Interface Software Reset Problem

Hi,
I have a problem with jetson nano’s Sdmmc interface. About development environment:

  • I use Jetson Nano Production Module(P3448 rev.B01)
  • Baseboard is own production. (We use Emmc for Jetpack Image, Sdmmc Interface for extra storage).
  • JetPack 4.3 with kernel 4.9.
  • I tried different brands and models of sd cards.

The problem is when I try to write big chunks of data to sd card(with UI or dd command) the kernel hung and reset the board with the error below. Smaller data writes (less than 120 seconds) gives no error.

[ 242.842083] INFO: task kworker/1:2:1576 blocked for more than 120 seconds.
[ 242.849437] Not tainted 4.9.140-tegra #1
[ 242.854052] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 242.862291] Kernel panic - not syncing: hung_task: blocked tasks
[ 242.868290] CPU: 3 PID: 587 Comm: khungtaskd Not tainted 4.9.140-tegra #1
[ 242.875072] Hardware name: NVIDIA Jetson Nano Developer Kit (DT)
[ 242.881078] Call trace:
[ 242.883539] [] dump_backtrace+0x0/0x198
[ 242.888963] [] show_stack+0x24/0x30
[ 242.894046] [] dump_stack+0x98/0xc0
[ 242.899107] [] panic+0x11c/0x298
[ 242.903894] [] watchdog+0x300/0x3b8
[ 242.908963] [] kthread+0xec/0xf0
[ 242.913784] [] ret_from_fork+0x10/0x40
[ 242.919105] SMP: stopping secondary CPUs
[ 242.923082] Kernel Offset: disabled
[ 242.926565] Memory Limit: none
[ 242.939129] Rebooting in 5 seconds…

I attached dmesg output and dts file below. Also we produce dtsi file based on these nvidia topics.(Latest dts which I attached could not include all patches below but I tried all of them with different combinations.)

  1. https://forums.developer.nvidia.com/t/microsd-card-not-detected-on-jetson-nano-production-module/80776/14

  2. https://forums.developer.nvidia.com/t/sd-card-not-detected/108226/26

  3. https://forums.developer.nvidia.com/t/slow-sd-card-access-speed-read-write-with-jetson-nano-production-module/111749/31

Could you please help me with this issue.
Thanks.

dmesg.log (58.7 KB)
tegra210-p3448-0002-p3449-0000-b00.dts.txt (280.7 KB)

Hi,

Please also refer to this thread and also add vmmc-always-on in dts.

Also, we have two questions for you

  1. Your log does not show the reboot error but just the normal dmesg. We need that error log.

  2. In your dmesg, there is CRC error from mmc1, do you always have this case in every boot?

If issue is still, please also enable more log by this patch.

Hi Wayne,

I applied your patch which you post on this post and added nvidia,vmmc-always-on; to tegra210-sdhci.dtsi (Also removed min,max tap delays according to this).

  1. New live dmesg output attached. I copied that from the console so it’s a bit messy but error is the same as the question post.

  2. I think that reset broke filesystem on the sd card(Not sure). After recreating filesystem that error went away.

New dts attached.

Thanks for the fast reply.

dmesg.log (57.6 KB)
tegra210-p3448-0002-p3449-0000-b00.dts.log (280.6 KB)

I applied debug patch and attached new dmesg.

dmesg.log (65.4 KB)

Hi,

I also have one question.

  1. In your first log here, there is no crc error from mmc1. Actually, this is a perfect boot up log. Why do you say error is still the same? You didn’t capture the error part here.
  1. New live dmesg output attached. I copied that from the console so it’s a bit messy but error is the same as the question post.
  1. This time I saw the kernel panic but I don’t see any error from sdcard driver. Could you remove the “quiet” keyword from extlinux.conf to enable more log from serial console?

I applied debug patch and attached new dmesg.

  1. Also, could you still put the rootfs on emmc first, mount sdcard after boot up and then write big chunk data? Want to see if you have error in this case.

Hi Wayne,

  1. That was my fault; log on the #4 post is old, you are right. So forget that log. The error I mean was that the kernel panic.

  2. I am using dmesg so quiet won’t affect anything but I added debug to bootargs. Attached new log(log1) with debug messages. But again there is no driver debug print when the kernel panic appears.

  3. Attached error log(log2). Again kernel panic occurred. Seems same with log1.

Thanks.

log1.log (74.5 KB)
log2.log (75.0 KB)

OK. we will check this issue. Will update to you once we find something.

BTW, could you share the comment you are using to write data to sd?

I am using:

sudo dd if=/dev/zero of=/media/nvidia/flash/test1.txt bs=64M count=500 conv=fsync

But this is just for ease. You also can try to copy video or something else.

Waiting your update Wayne, Thanks.

Hi,

Please use either of below method to disable the panic on hung task and run again.

  1. disabling this config in tegra_defconfig and rebuild kernel.

CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=1

or

  1. Use debugfs to disable it.
    echo 0 to /proc/sys/kernel/hung_task_panic before running the test

Your “write large file” does not need to run in multi-thread, right?

Also one suggestion, please format this card as exFat instead of ext4.

and please verify this issue after hotplug sdcard too.

Hi Wayne,

Does disabling hung panic mechanism is safe? Because there could be real problematic hung tasks and that time all OS doesn’t freeze?

Will try that.

Thanks.

Hi,

It is all for debug purpose. As I pointed out in previous comment, we don’t see any error from sdcard driver.

Thus, our sd driver expert suggests to disable hang detection and mount as extFat first.

I write 34GB to sdcard successfully and this time there is some debug prints and different traces shows up. I attached a new log.

Thanks.

log3.log (85.7 KB)

Hi,

Could you remove max-clk-limit = <400000>; in your device tree ?

Hi Wayne,
Actually, there is none. Am I missing?

It is set to 0xc28cb00(204Mhz).

Thanks.

Hi,

Could you help share the output of

sysctl -a | grep dirty

Also, please confirm the results with exFat filesystem too.

Hi,
I successfully write big sized data without any problem with both filesystem after turning off panic reset mechanism. Also, the speed is fine.

I attached the output below.

Thanks.
sysctl_out.txt (759 Bytes)

We need to check couple of things:

Based on one of the logs shared here, I see the card is enumerating in HS mode instead of UHS mode.

Does the custom carrier board support power cycling for SD slot supply on warm boot?
If not, are you sure your build has

  1. the WAR to bypass CMD11 in kernel image (Jetson Nano SD card enters back to high speed mode instead of uhs mode after soft reboot)

  2. has “nvidia,vmmc-always-on” in the dt?

If the issue is still seen, can you help check if the following commands help
sudo sysctl -w vm.dirty_ratio=50
sudo sysctl -w vm.dirty_background_ratio=5
sudo sysctl -p

Verify that above changes are reflected in the system using sudo sysctl -a | grep dirty and write large data the card.

Ensure that all the write tests are run with hung task timeouts enabled.