Cannot connect SD card to Jetson TX2 NX, even with working card detect

I understand that the nvidia,vmmc-always-on flag is incorrect if the power is not always on… But if I have a card detect do I not want the power to be toggled on and off by the presence of the card? This is how our TX2 carrier works for the old model.

If pull up that GPIO can make it work, then fine. Just pull the gpio up.

There is also alternative way to do that. But you need to provide the gpio to the regulator.

I think we can adopt the easy way first to make sure the sdcard slot can work.

This is the alternative I am talking about. But I think you are not familiar with regulator framework enough. So we can try easy way first.

Okay so I should make an external pull up on gpio07 to make sure that the power is always on and then see if I can connect?

Yes, make that first.

Okay, while I’m waiting on our designer to make that change can you try to explain the alternative with the gpio please? Thanks.

Okay so we have the enable to the the load controller grounded so that the output is always on. And now we are getting kernel panic and boot loop on startup. Here is my dts. Any ideas?
sdhci_3440000_test41.dts (239.2 KB)

So I removed the sdhci@3400000 section from the dts due to your suggestion in this post https://forums.developer.nvidia.com/t/xavier-som-cant-boot-normally/119803/10

As far as I can tell, there is no change to the kernel logs at boot. We continue to kernel panic and boot loop.

It’s worth noting that once in a blue moon the boot will succeed and I can log into the device remotely. When this happens, albeit rarely, the SD card actually shows up in the filesystem under mmcblk1.

Here is the dump of the kernel log. Please let me know if you have any more thoughts. Thanks.

kernel_panic_logs.txt (45.9 KB)

Hi,

Please try to dump the full kernel log. What you dump is uart log and default uart log does not have full kernel log.

The issue we are checking is in kernel. So only dmesg should be sufficient. UART log is mostly for bootloader log. No relevant to our case.

And … if you are not familiar of other NV products, I would suggest you just refer to Nano/NX and TX2-NX topics…

So I removed the sdhci@3400000 section from the dts due to your suggestion in this post https://forums.developer.nvidia.com/t/xavier-som-cant-boot-normally/119803/10

Your referral to AGX post seems go off topic… For example, you don’t know what AGX’s hardware design is. If you just follow the guidance without really comparing the hardware between AGX and TX2-NX, you may cause new problem…

The device does not boot. How do I get kernel logs if I can’t get a shell? The UART dump is the only information I have when it is in a boot loop…

And I understand that the AGX does not use the same hardware as the TX2-NX but at this point I am trying literally anything that I can. I see this output in the UART log


U-Boot 2020.04-g46e4604c78 (Jul 26 2021 - 12:10:58 -0700)

SoC: tegra186
Model: NVIDIA P3636-0001
Board: NVIDIA P3636-0001
DRAM:  3.8 GiB
MMC:   sdhci@3400000: 1, sdhci@3460000: 0
Loading Environment from MMC... *** Warning - bad CRC, using default environment

Is it not suspect that the two MMC sections are 34600000 (expected) and 3400000???

What other information can I provide that will be helpful?

Hi,

Flash a dtb that can boot up fine, go to /boot/extlinux/extlinux.conf and remove the “quiet” keyword. After doing that your uart log will have full dmesg.

Is it not suspect that the two MMC sections are 34600000 (expected) and 3400000???

There is no need to suspect these. You already saw other users’ forum post regarding extra sd slot and everyone’s change is on sdhci@34400000. Only this controller is mapping to sdmmc3. Also, there is no other sdmmc pin out on TX2-NX module.

Just in case you need a clear instruction

  1. Flash your dtb back to something that you can access the extlinux.conf, remove the “quiet”. reboot and make sure you can see the full dmesg in uart.

  2. Reflash the dtb to the last one with sdhci patch which will cause boot loop. Boot up again and share the log.

Here’s the full dmesg in the boot sequence. For some reason the dmesg doesn’t appear until the second boot attempt.

kernel_panic_logs_full.txt (123.0 KB)

Just to be clear. The thing that causes the boot loop is the hardware change that you suggested. By enabling the load switch to the SD card all the time I incur this problem.

The board will boot without a problem if there is no SD card plugged in.

One thing I noticed in the dts is that this section seems to point to sdhci@34400000

	bcmdhd_wlan {
		compatible = "android,bcmdhd_wlan";
		interrupt-parent = <0x23>;
		interrupts = <0x32 0x4>;
		fw_path = "/lib/firmware/brcm/fw_bcmdhd-old-unlocked.bin";
		nv_path = "/lib/firmware/brcm/nvram.txt";
		sdhci-host = <0xc7>;
		pwr-retry-cnt = <0x3>;
		status = "okay";
		linux,phandle = <0x160>;
		phandle = <0x160>;
	};

And then in the kernel panic message the NetworkManager is implicated in someway?

[   64.189124] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [NetworkManager:4324]
[   64.197208] Kernel panic - not syncing: softlockup: hung tasks
[   64.203033] CPU: 0 PID: 4324 Comm: NetworkManager Tainted: G        W    L  4.9.253-tegra #1

Idk… just thinking out loud…

Oh… that is a great point. I guess I understand what happened here…

In the beginning, I thought your dts is clean one. I mean I thought I should only care about sdhci@3440000. Just as other users’ cases.

But it looks like you already put lots of unnecessary configuration to your dts and those things are causing unnecessary problem…

Could you just remove all the change you’ve done so far and just focus on what I said in sdhci@3440000 and regulator?

I mean default dts file for TX2-NX without adding any other extra setting but only the sdhci change and regulator.

The only changes that I have made are in the sdhci@3440000…
Here is the defualt dts that I got from the Jetpack directory…

og_jetpack_4.6.dts (239.2 KB)

That section is there from the start in every version that I have been working with

Weird. Let me check what is going on here.

Are you using rel-32.7.1 or rel-32.6.1?