Jetson Nano Production Module - SD Card Interface Software Reset Problem

I write 34GB to sdcard successfully and this time there is some debug prints and different traces shows up. I attached a new log.

Thanks.

log3.log (85.7 KB)

Hi,

Could you remove max-clk-limit = <400000>; in your device tree ?

Hi Wayne,
Actually, there is none. Am I missing?

It is set to 0xc28cb00(204Mhz).

Thanks.

Hi,

Could you help share the output of

sysctl -a | grep dirty

Also, please confirm the results with exFat filesystem too.

Hi,
I successfully write big sized data without any problem with both filesystem after turning off panic reset mechanism. Also, the speed is fine.

I attached the output below.

Thanks.
sysctl_out.txt (759 Bytes)

We need to check couple of things:

Based on one of the logs shared here, I see the card is enumerating in HS mode instead of UHS mode.

Does the custom carrier board support power cycling for SD slot supply on warm boot?
If not, are you sure your build has

  1. the WAR to bypass CMD11 in kernel image (Jetson Nano SD card enters back to high speed mode instead of uhs mode after soft reboot - #4 by WayneWWW)

  2. has “nvidia,vmmc-always-on” in the dt?

If the issue is still seen, can you help check if the following commands help
sudo sysctl -w vm.dirty_ratio=50
sudo sysctl -w vm.dirty_background_ratio=5
sudo sysctl -p

Verify that above changes are reflected in the system using sudo sysctl -a | grep dirty and write large data the card.

Ensure that all the write tests are run with hung task timeouts enabled.

How can I find it? We have connected sd power to 3v3 according to your design guide. I will get and attach the schematics asap.

I didn’t understand that. We had that patch one of our build version but it did not solve the issue. What should I do? Could you explain more, please?

I checked. We have that line on device tree.

After these changes, card reset again. Nothing changed.

I attached the schematic below.

Thanks.

Hi,

I notice you don’t have sdmmc_vdd_en connected on your load switch. Is that right?

Also, is this card always enumerated as HS card instead of UHS? How about other cards? Are they able to be detected as UHS?

Hi Wayne,

That’s true, Is that cause a problem? We designed that according to your design guide.

I checked it. On cold boot, it detects sdcard as UHS, but after software reset the same sdcard is detected as HS then I remove and replug the sdcard and it is registered as UHS again.

I think that is a problem but is this a cause of the main problem or another sdmmc problem?

When the card is running in UHS, could you check if this error happens?

Also, could you dump the iozone result with us? Please mount your sd on /mnt and share the result with us.
During this test, please check card is under UHS mode.

Install IOZone3:   
$ sudo apt-get update 
$ sudo apt-get install iozone3
Sequential read/write command:
$ iozone -ecI -+n -L64 -S32 -s64m -r512k -i0 -i1 -l8 -u8 -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8

I attached test output and dmesg logs.

test_out.txt (8.8 KB)
dmesg.txt (58.0 KB)

Hi,

The number looks good to us.

Still have some questions to check…

  1. Did you try exFAT with hung task panic enabled? Based on what we have here, exFat and ext4 tests were tried with disabled reset panic.
  2. Will iozone still work if the card is in HS mode?
  3. If vmmc-always-on and patch skipping CMD11 are present in the your build, the card is still enumerating in HS mode, This is something we need to debug. Though, it is always recommended to have power cycling support for the SD slot supply. Actually, the latest design guide on NX and NANO both show the GPIO for SDMMC_VDD_EN.

  1. This issue still happens to multiple cards, right?

Hi Wayne,

Yes, I tried. Everything is the same for both filesystem on every test for now.

Speed decreased but there is no other problem when using iozone, output attached.
iozone_when_HSmode.txt (2.1 KB)

I had turned off that patch for a while, now I reapplied that patch and HS mode issue solved. It is registered as UHS mode on soft and hard reset. But our main issue continue, It resets. Also, there is another print shows up and spamming when I unplug the sdcard. Print is:
sdhci-tegra sdhci-tegra.2: Tuning done, restoring the best tap value : 32

Here is the dmesg log after patch applied.
dmesg_afterPatch.txt (89.1 KB)

I attach diff file below. You can check what I did until now.
diff.txt (6.2 KB)

That’s right.

Hi,

sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.dirty_background_ratio=5
sudo sysctl -p

I don’t really get the issue of “card reset again” here. You should apply this configuration and try to write big size data to your card and see if kernel panic is still there. Also, in this test, please remember to enable hung task panic timeout.

I had turned off that patch for a while, now I reapplied that patch and HS mode issue solved . It is registered as UHS mode on soft and hard reset.

It sounds you didn’t apply the patch correctly last time.

Also, after your configuration in sysctl, please also run dd with oflag=direct.
That bypasses all the caches and does an direct transfer.

Your device tree and patches all look good to us.

Hi Wayne,

Actually I didn’t understand :). I made those changes and enabled the hung task panic(mean OS will reset if hung) then I wrote big chunk of data and process hung after 120 second. Because of that OS triggered software reset. Isn’t that normal? Please, correct me if I am wrong.

After the patch applied, I tried the same process again. Please, check the console output attached below.

I removed the applied patch when I open this topic because I thought it wasn’t necessary. Now we have that patch too.

This time I added that argument too, check the output.

Thanks,

log.txt (61.2 KB)

Actually I didn’t understand :). I made those changes and enabled the hung task panic(mean OS will reset if hung) then I wrote big chunk of data and process hung after 120 second. Because of that OS triggered software reset. Isn’t that normal? Please, correct me if I am wrong.

Never mind. Actually I was just not able to get your meaning of “card reset again”. It is just kernel panic here.