Jetson Nano production module takes long time to boot when SD card is inserted

Hi,

I am using a Jetson Nano production module with R32.2 kernel source code (as there are some custom drivers implemented) built on the host machine.

The problem is it takes a long time (sometimes 2 minutes 45 seconds +) to boot, when SD card is inserted. The kernel public source did not seem to have the correct SD card support implementation for Nano production modules, and following advice from here: https://devtalk.nvidia.com/default/topic/1062120/jetson-nano/microsd-card-not-detected-on-jetson-nano-production-module/post/5394938/#5394938, the kernel was patched for the required feature.

Since the kernel code was modified owing to an additional driver I am using, I had to change the patch (from the one provided in the forum, as mentioned above) slightly. Please find it below.

diff --git a/hardware/nvidia/platform/t210/common/kernel-dts/t210-common-platforms/tegra210-p2530-common.dtsi b/hardware/nvidia/platform/t210/common/kernel-dts/t210-common-platforms/tegra210-p2530-common.dtsi
--- a/hardware/nvidia/platform/t210/common/kernel-dts/t210-common-platforms/tegra210-p2530-common.dtsi	2019-09-18 13:21:25.000000000 +0200
+++ b/hardware/nvidia/platform/t210/common/kernel-dts/t210-common-platforms/tegra210-p2530-common.dtsi	2019-09-18 13:26:22.723470000 +0200
@@ -131,7 +131,7 @@
 		uhs-mask = <0x1c>;
 		power-off-rail;
 		nvidia,update-pinctrl-settings;
-		status = "disabled";
+		status = "okay";
 	};
 
 	sdhci@700b0200 {
diff --git a/hardware/nvidia/platform/t210/porg/kernel-dts/porg-plugin-manager/tegra210-porg-plugin-manager.dtsi b/hardware/nvidia/platform/t210/porg/kernel-dts/porg-plugin-manager/tegra210-porg-plugin-manager.dtsi
--- a/hardware/nvidia/platform/t210/porg/kernel-dts/porg-plugin-manager/tegra210-porg-plugin-manager.dtsi	2019-09-18 13:21:25.000000000 +0200
+++ b/hardware/nvidia/platform/t210/porg/kernel-dts/porg-plugin-manager/tegra210-porg-plugin-manager.dtsi	2019-09-18 13:28:08.251948000 +0200
@@ -313,7 +313,8 @@
 			override@1 {
 				target = <&sdhci2>;
 				_overlay_ {
-					vmmc-supply = <&max77620_ldo6>;
+					status = "okay";
+					vqmmc-supply = <&max77620_ldo6>;
 					no-sdio;
 					no-mmc;
 					sd-uhs-sdr104;
diff --git a/hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-porg-p3448-common.dtsi b/hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-porg-p3448-common.dtsi
--- a/hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-porg-p3448-common.dtsi	2019-09-18 13:21:25.000000000 +0200
+++ b/hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-porg-p3448-common.dtsi	2019-09-18 13:30:00.344454000 +0200
@@ -250,9 +250,14 @@
 	};
 
 	sdhci@700b0400 {
-		status = "disabled";
+		status = "okay";
 		/delete-property/ keep-power-in-suspend;
 		/delete-property/ non-removable;
+		mmc-ddr-1_8v;
+		mmc-ocr-mask = <3>;
+		uhs-mask = <0x0>;
+		max-clk-limit = <400000>;
+		tap-delay = <3>;
 	};
 
 	sdhci@700b0200 { /* SDMMC2 for Wifi */

I did not yet implement auto-mount-sd-card-on-startup by changing fstab. I plan to have a script (do not have it yet) which before running my startup application would mount the /dev/mmcblkp1 (sd card partition) at $HOME/sd_card location. But at the moment I do this manually. It seems the sd card mounting works, but the device takes a long time to boot when the sd card is kept inserted. Here’s the output of systemd-analyze time:

Startup finished in 11.261s (kernel) + 2.454s (userspace) = 13.715s
multi-user.target reached after 2.423s in userspace

The reboot command is however faster, with booting taking place in 30 seconds! Similar to the case when no SD card is inserted, and irrespective of whether the SD card patch is applied or not.

Could someone shed some light on what might be happening here?

P.S. I still boot from the internal flash memory and plan to use the SD card to store data which my startup application generates on specific conditions.

Hi,

Could you share more detail about how the log changes after you enabled sdcard on your board?
Since production modules don’t have sdcard slot, we cannot know what is the impact.

Also, the “2 mins 45 seconds” boot sounds stuck at userspace but not kerenl. Please confirm.

Hi WayneWWW,

Thanks for your response.

Would you need the kern.log or dmesg? I could not find a significant difference in these logs which could seem to add to the delay.

Sorry, I think the ‘booting’ term was incorrect. As per the output of systemd-analyze time, the multi-user target is reached in around 16 seconds. By ‘2 minutes 45 seconds’, I was trying to say the time needed from power on till it is possible to SSH into the module. The Nano does not show up on the network until then, even after having a static IP assigned. It is stuck somewhere in the userspace, but why this happens only when the SD card is inserted, I have no clue. I can SSH into the Nano right after 30 seconds after power on, but without the SD card being inserted. My host PC and the Nano are on the same network, not connected directly, but via a router, and both via ethernet. The host has a dynamic IP.

We have a custom carrier board with SD card slot. The history of modifications for SD card support in the kernel can be found here: [url]https://devtalk.nvidia.com/default/topic/1062120/jetson-nano/microsd-card-not-detected-on-jetson-nano-production-module/post/5395579/#5395579[/url].

jetson_user,

It would be better sharing the dmesg and the syslog.

syslog could tell us what is system doing after kernel boot.

Also, do you mean the monitor does not get into ubuntu desktop immediately but stuck at the console to configure something? If so, could you take a picture where it get stuck?

Hi WayneWWW,

Thanks for your response, and sorry for the late update. I was out of office.

Please find attached the logs (syslog and dmesg output).

The monitor does not come into picture here. We do not have an HDMI connector on our boards. Besides, we use the l4t_create_default_user.sh script to create a default user before flashing, so that the system configuration including creation of default user via desktop, post flashing, is skipped. This is important, as otherwise, the system would wait on first boot for default user creation, and since no monitor access is possible at our end, this cannot be done - resulting in an infinite wait before ssh.

So basically, I cannot tell what is going on after boot. The systemd-analyze does not report a delay, so I would assume something is happening after user space is entered.
dmesg_output.txt (57.6 KB)
syslog.txt (438 KB)

Hi,

I see a lot of i2c error spew in your kernel and seems cause the system slow.

It looks from your monitor.

Nov 27 09:49:32 tegra-ubuntu gsd-color[7596]: failed to get edid: unable to get EDID for output
Nov 27 09:49:32 tegra-ubuntu gsd-color[7596]: unable to get EDID for xrandr-HDMI-0: unable to get EDID for output
Nov 27 09:49:32 tegra-ubuntu kernel: [   62.453252] tegra-i2c 7000c700.i2c: pio timed out addr: 0x50 tlen:28 rlen:128
Nov 27 09:49:32 tegra-ubuntu kernel: [   62.460423] tegra-i2c 7000c700.i2c: --- register dump for debugging ----
Nov 27 09:49:32 tegra-ubuntu kernel: [   62.468422] tegra-i2c 7000c700.i2c: I2C_CNFG - 0x22c00
Nov 27 09:49:32 tegra-ubuntu kernel: [   62.474205] tegra-i2c 7000c700.i2c: I2C_PACKET_TRANSFER_STATUS - 0x10001
Nov 27 09:49:32 tegra-ubuntu kernel: [   62.481673] tegra-i2c 7000c700.i2c: I2C_FIFO_CONTROL - 0xe0
Nov 27 09:49:32 tegra-ubuntu kernel: [   62.487916] tegra-i2c 7000c700.i2c: I2C_FIFO_STATUS - 0x800040
Nov 27 09:49:32 tegra-ubuntu kernel: [   62.494522] tegra-i2c 7000c700.i2c: I2C_INT_MASK - 0x7d
Nov 27 09:49:32 tegra-ubuntu kernel: [   62.500534] tegra-i2c 7000c700.i2c: I2C_INT_STATUS - 0x0
Nov 27 09:49:32 tegra-ubuntu kernel: [   62.506531] tegra-i2c 7000c700.i2c: i2c transfer timed out addr: 0x50

Hi,

Is it a solution then we switch off desktop, using instructions here:
https://devtalk.nvidia.com/default/topic/1050739/how-to-boot-jetson-nano-in-text-mode-/
?

Because we do not use a monitor at all, we do not even have the HDMI on our custom carrier boards.

Without the HDMI hardware you have to disable the i2c queries through the device tree. I don’t know of the specific changes, but basically your board is timing out trying to query hardware which doesn’t exist.

Yes, if you don’t have a monitor, please disable that i2c controller or the nvdisplay in device tree.

Hi,

Thanks for the inputs. I tried disabling all occurrences of “hdmi-display” in the following files:

  1. hardware/nvidia/platform/t210/common/kernel-dts/t210-common-platforms/tegra210-ers-hdmi-e2190-1100-a00.dtsi
  2. hardware/nvidia/soc/t210/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi
  3. hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-porg-p3448-common.dtsi

I do not know how to disable nvdisplay in device tree, or whether the above is effectively doing that.

But I do not know how this might be the real issue, even though it is reporting timeouts.

I observe the delay to be able to SSH into the Nano only when the SD card is inserted. So with and without the SD card inserted the same device tree is flashed, but the difference in the times in massive. I get a ‘no route to host’ error when I try to log in say before the 2 min 45 seconds measured (though it should have been ‘connection refused’, which is not the case). I checked the ip routes, etc., but again, this issue occurs only with an SD card inserted. If the SD card is not inserted, then with and without our custom patches, it is possible to SSH into the Nano in 30 seconds. There was a patch to enable the SD card (see question for details), which I think might also be causing the issue.

Could you please advise as to the further steps?

P.S. I also switched off the desktop at startup, still no change.

Thanks in advance!

I am sorry that if this is jetson nano, then you should disable tegradc@xxxx. nvdisplay is for tx2/xavier.

Please check if i2c error spew is gone.

Also, I think it is wrong direction to check this issue by using ssh. We should look into the UART log for why ssh cannot launch. The i2c error spew may have chance cause system hang.

Hi,

I disabled tegradc.0 and tegradc.1 references everywhere. The i2c timeout messages are indeed not visible anymore.

But the issue still exists. I understand it is not a slow boot issue, as user space is entered quickly, as I already mentioned. What is slow is the access to Nano after boot.

The boards we use cannot access serial console, access to Nano is possible only over network.

I understand SSH may not be the direction to go, but at the moment it is a very strong indicator towards the behavior differences with and without the SD card inserted. Moreover, our application (the most important entity for us running on the Nano), which runs on startup, is also delayed. Same delay difference (30 seconds without SD card inserted and 2 min 45 seconds with SD card inserted) is observed in the startup of our application once the system is powered on, as with the SSH login case. The application broadcasts an IP soon after it starts running, which obviously does not happen (Please note that whether the application has started is not determined by this broadcast of IP, but with an indicator LED. The broadcast IP is used by other communicating devices, which obviously fail to determine whether our application has started.). Hence, somehow the system is stuck during this delay or ‘wait’ period, due to which neither our application can run, nor the Nano can be accessed, nor does the Nano show up on the network, nor can anything else be done.

Even without our startup application enabled, the issue is observed (i.e. access to Nano over SSH is delayed by a lot with SD card inserted).

I think we might have a look into how the patch (see question) might change things. It is very clear that our system starts up much faster without the SD card inserted, and SD card is essential for our product, so we cannot do without it - and hence the patch also cannot be discarded. We are approaching our deadlines and once the product is shipped it is impossible to flash it in case there is a kernel/device tree fix for our case.

Please let me know in case more information is needed to debug this issue.

Please also find attached the updated syslog, if it might help.
syslog_nano_boot_delay_sd_card.txt (114 KB)

I do not have an answer, but some things from the log should be emphasized:

Dec  3 16:09:47 tegra-ubuntu kernel: [    1.941211] mmc1: host does not support reading read-only switch, assuming write-enable
Dec  3 16:09:47 tegra-ubuntu kernel: [    1.945705] mmc1: new high speed SDXC card at address aaaa
Dec  3 16:09:47 tegra-ubuntu kernel: [    1.946062] mmcblk1: mmc1:aaaa AGGCE 59.5 GiB 
Dec  3 16:09:47 tegra-ubuntu kernel: [    1.946212] mmc0: new HS400 Enhanced strobe MMC card at address 0001
Dec  3 16:09:47 tegra-ubuntu kernel: [    1.949629]  mmcblk1: p1
Dec  3 16:09:47 tegra-ubuntu kernel: [    1.950632] <b>last reset is due to tegra watchdog timeout</b>

…so the filesystem could possibly be damaged if the system had locked up and the journal replay had to remove content.

The kernel version:

Dec  3 16:09:47 tegra-ubuntu kernel: [    0.000000] Linux version 4.9.140 (anurag@ubuntu-16) (gcc version 7.4.1 20181213 [linaro-7.4-2019.02 revision 56ec6f6b99cc167ff0c2f8e1a2eed33b1edc85d4] (Linaro GCC 7.4-2019.02) ) #2 SMP PREEMPT Tue Dec 3 13:01:01 CET 2019

…has this kernel ever been recompiled or modified?

Lots of network issues which tend to be either configuration or problems with the DHCP server:

Dec  3 16:09:47 tegra-ubuntu NetworkManager[3897]: <info>  [1575389387.7030] NetworkManager (version 1.10.6) is starting... (for the first time)
Dec  3 16:09:47 tegra-ubuntu NetworkManager[3897]: <info>  [1575389387.7032] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 10-dns-resolved.conf, 20-connectivity-ubuntu.conf, no-mac-addr-change.conf) (etc: 10-globally-managed-devices.conf, 10-ubuntu-fan.conf, default-wifi-powersave-on.conf)
Dec  3 16:09:47 tegra-ubuntu nv.sh[4004]: /etc/systemd/nv.sh: line 73: warning: command substitution: ignored null byte in input

Plus I see WiFi involved, and in part of the networking:

Dec  3 16:09:48 tegra-ubuntu NetworkManager[3897]: <warn>  [1575389388.0676] Error: failed to open /run/network/ifstate

Where is the SD card mounted?

It is a little bit hard to tell what might be wrong if you don’t have a UART and no monitor. If you have monitor, then we may know where it get stuck before entering gdm.

Actually I notice there is a 2 min gap between these lines

Dec  3 16:09:56 tegra-ubuntu systemd[1]: Started User Manager for UID 1000.
Dec  3 16:13:00 tegra-ubuntu systemd-timesyncd[3438]: Synchronized to time server 83.98.155.30:123 (1.pool.ntp.org).

However, systemd-timesyncd is used for syncing time, so maybe this gap just comes from systemd-timesyncd.

Please try more times and see if you could always observe any time gap larger than 2 min in syslog.

If you cannot, then it may indicate system hangs somewhere and no verbose log in syslog.

Hi,

Thanks for your inputs!

WayneWWW,

Is it possible I could somehow redirect the output meant to go on uart to a file?

Meanwhile I shall check the time gaps like you mentioned.

linuxdev,

Probably the serial logs might help with debugging the watchdog timeout, as I don’t know what might be causing it.
The kernel is modified with some patches, particularly those for sd card support enable and vi2 timestamp capture. The source code is from R32.2 public_sources/kernel_src.

The SD card is mounted on the carrier board. I have no clue about the WiFi related logs you mention, but the Nano is connected via ethernet to our local network and over USB-microUSB during flashing and initial setup. Also, it is not configured so, as no corresponding network interface exists.

By mount location I was actually thinking about where any partitions are mounted.

If you do not use WiFi, then you might consider going in to the GUI tools (not sure where) and marking the WiFi as disabled. If there is some attempt to bring up an unconfigured WiFi, then perhaps this adds to boot delays.

Hi,

The SD card is mounted at /dev/mmcblk1p1. Will check the WiFi disabling and get back. Thanks!

“/dev/mmcblk1p1” is the device itself. What do you see from:

lsblk -f /dev/mmcblk1

Attached image with the output I get. Also the same in text below:

NAME        FSTYPE LABEL UUID                                 MOUNTPOINT
mmcblk1                                                       
└─mmcblk1p1 ext4         d8d9a384-d640-44fc-9472-b097cecdc9de

Am I correct that this partition is not intended to be the root partition? If that is the case, and since there is no mount point, then boot should not be prevented or slowed down. Are you able to succeed with the following:

sudo -s
mount /dev/mmcblk1p1 /mnt
cd /mnt
ls
cd
umount /dev/mmcblk1p1
exit

If this works, then I am tempted to believe that even if SD card presence somehow interacts with boot issues, that the primary cause of the boot issue is something related to i2c or the networking issues from the earlier logs. If this test does not work, then you should monitor “dmesg --follow”, and post what the logs show during the mount and umount.