Jetson Nano production module takes long time before it can be SSHed into

Hi,

I am using a Jetson Nano production module with R32.2 kernel source code (as there are some custom drivers implemented) built on the host machine.

The problem is it takes a long time (sometimes 2+ min.) to be able to SSH into the device from the host. I tried reducing timeouts in network related service files, and also did a “UseDNS no”, “GSSAPIAuthentication no” and “UsePAM yes” in /etc/ssh/sshd_config on the Nano, and practically everything I found online about similar issues but with different Linux-based platforms, like RPi, but it did not help.

Any suggestions?

I can always ping the device quickly after a boot. To give an idea, the boot takes around 13 seconds. And the Nano and my host machine are on the same network and communicate over Ethernet.

Some logs from ssh -vvv, while attempting to connect “too soon” after powering on the Nano:

OpenSSH_7.2p2 Ubuntu-4ubuntu2.8, OpenSSL 1.0.2g 1 Mar 2016
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: resolving “<ip_address_of_nano>” port 22
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to <ip_address_of_nano> [<ip_address_of_nano>] port 22.
ssh: connect to host <ip_address_of_nano> port 22: No route to host

Can strace from Nano while sshd is running be helpful?

This points directly at a network issue:

ssh: connect to host <ip_address_of_nano> port 22: No route to host

For both the host and the Nano, what do you see from the commands “ifconfig” and “route”? Are you using entirely wired networking?

Hi linuxdev,

The routes and ifconfig outputs were verified. The subnets are indeed the same for both, the Nano and the host (Ubuntu 16) PC. The Nano is connected to the office network via ethernet and so is the PC. The switches also form network for wireless routers.

Can you post the output of those commands, “ifconfig” and “route”, from both the Jetson and the host PC?

FYI, many networks, especially WiFi, will prevent MAC addresses which are not whitelisted from access. Do be sure to check that any WiFi is not configured to block the Jetson.

Hi,

  1. The ifconfig output for host:
docker0   Link encap:Ethernet  HWaddr 02:42:7e:c9:a1:58  
          inet addr:172.17.0.1  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

enp0s31f6 Link encap:Ethernet  HWaddr 54:bf:64:75:a6:18  
          inet addr:192.168.2.102  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::d801:f176:ba87:1eb5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:36780256 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9441127 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:12011817570 (12.0 GB)  TX bytes:8559315398 (8.5 GB)
          Interrupt:16 Memory:ef180000-ef1a0000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:25444 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25444 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2726141 (2.7 MB)  TX bytes:2726141 (2.7 MB)

lxcbr0    Link encap:Ethernet  HWaddr 00:16:3e:00:00:00  
          inet addr:10.0.3.1  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
  1. The route output for host:
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         192.168.2.1     0.0.0.0         UG    100    0        0 enp0s31f6
10.0.3.0        *               255.255.255.0   U     0      0        0 lxcbr0
link-local      *               255.255.0.0     U     1000   0        0 lxcbr0
172.17.0.0      *               255.255.0.0     U     0      0        0 docker0
192.168.2.0     *               255.255.255.0   U     100    0        0 enp0s31f6
  1. The ifconfig output for Nano:
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.2.187  netmask 255.255.255.0  broadcast 192.168.2.255
        inet6 fe80::36:d228:6577:6d5a  prefixlen 64  scopeid 0x20<link>
        ether 00:04:4b:e5:d8:f8  txqueuelen 1000  (Ethernet)
        RX packets 48959  bytes 3892170 (3.8 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 120  bytes 14885 (14.8 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 149  base 0xe000  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1  (Local Loopback)
        RX packets 163  bytes 14235 (14.2 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 163  bytes 14235 (14.2 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

rndis0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether ce:66:f9:59:de:31  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

usb0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether ce:66:f9:59:de:33  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
  1. And the route output for Nano:
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         _gateway        0.0.0.0         UG    100    0        0 eth0
link-local      0.0.0.0         255.255.0.0     U     1000   0        0 eth0
192.168.2.0     0.0.0.0         255.255.255.0   U     100    0        0 eth0

Besides, we have been working with modified kernels. There is no out-of-box support for SD card, for which we had to use a patch, please find below.

diff --git a/hardware/nvidia/platform/t210/common/kernel-dts/t210-common-platforms/tegra210-p2530-common.dtsi b/hardware/nvidia/platform/t210/common/kernel-dts/t210-common-platforms/tegra210-p2530-common.dtsi
--- a/hardware/nvidia/platform/t210/common/kernel-dts/t210-common-platforms/tegra210-p2530-common.dtsi	2019-09-18 13:21:25.000000000 +0200
+++ b/hardware/nvidia/platform/t210/common/kernel-dts/t210-common-platforms/tegra210-p2530-common.dtsi	2019-09-18 13:26:22.723470000 +0200
@@ -131,7 +131,7 @@
 		uhs-mask = <0x1c>;
 		power-off-rail;
 		nvidia,update-pinctrl-settings;
-		status = "disabled";
+		status = "okay";
 	};
 
 	sdhci@700b0200 {
diff --git a/hardware/nvidia/platform/t210/porg/kernel-dts/porg-plugin-manager/tegra210-porg-plugin-manager.dtsi b/hardware/nvidia/platform/t210/porg/kernel-dts/porg-plugin-manager/tegra210-porg-plugin-manager.dtsi
--- a/hardware/nvidia/platform/t210/porg/kernel-dts/porg-plugin-manager/tegra210-porg-plugin-manager.dtsi	2019-09-18 13:21:25.000000000 +0200
+++ b/hardware/nvidia/platform/t210/porg/kernel-dts/porg-plugin-manager/tegra210-porg-plugin-manager.dtsi	2019-09-18 13:28:08.251948000 +0200
@@ -313,7 +313,8 @@
 			override@1 {
 				target = <&sdhci2>;
 				_overlay_ {
-					vmmc-supply = <&max77620_ldo6>;
+					status = "okay";
+					vqmmc-supply = <&max77620_ldo6>;
 					no-sdio;
 					no-mmc;
 					sd-uhs-sdr104;
diff --git a/hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-porg-p3448-common.dtsi b/hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-porg-p3448-common.dtsi
--- a/hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-porg-p3448-common.dtsi	2019-09-18 13:21:25.000000000 +0200
+++ b/hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-porg-p3448-common.dtsi	2019-09-18 13:30:00.344454000 +0200
@@ -250,9 +250,14 @@
 	};
 
 	sdhci@700b0400 {
-		status = "disabled";
+		status = "okay";
 		/delete-property/ keep-power-in-suspend;
 		/delete-property/ non-removable;
+		mmc-ddr-1_8v;
+		mmc-ocr-mask = <3>;
+		uhs-mask = <0x0>;
+		max-clk-limit = <400000>;
+		tap-delay = <3>;
 	};
 
 	sdhci@700b0200 { /* SDMMC2 for Wifi */

It was discussed here: https://devtalk.nvidia.com/default/topic/1062120/jetson-nano/microsd-card-not-detected-on-jetson-nano-production-module/post/5380197/#5380197

The thing is, it was only after removing this patch that the long SSH delays were gone. I really have no clue how it is possible, but we would be needing the patch.

Could you please advise regarding this?

Another option: Is it possible that some other device on the network is sharing the same IP address?
That would explain why ping might be fast, but SSH might be slow.

Hi snarky,

No, I checked that. I also did not assign a static address to the Nano, to let the router assign it an address.

Also, now that I checked again, pinging is not possible before I can SSH. Don’t know how I observed that before. In fact the ip address of the Nano does not show up at all with netdiscover for a long time. Then when it shows up, I can SSH/ping. The Nano boots in 15 seconds, as observed by systemd-analyze output - the user space is entered within that time. Although this sometimes goes till 30 seconds, but still does not explain the 2-3 min delay before the device can be SSHed into.

If you are on the host and run “traceroute 192.168.2.0”, what shows up? What shows up from traceroute to the router (the default route) from “traceroute 192.168.2.1”?

I thought of the other device issue as well, but don’t see any collisions or other errors.

Btw, you can get a lot of logging from ssh via “ssh -v name@somewhere”. More “-vv” or “-vvv” for more verbose. Do expect a lot of failures as this is how options are tested even when everything works right. What matters are mostly the end errors.

Hi,

Sorry for my late reply. Was caught up with some other required fixes for the same module.

Please fing below the outputs:

  1. traceroute to 192.168.2.0 (192.168.2.0), 30 hops max, 60 byte packets
    connect: Permission denied

  2. traceroute to 192.168.2.1 (192.168.2.1), 30 hops max, 60 byte packets
    1 192.168.2.1 (192.168.2.1) 0.561 ms 0.529 ms 0.532 ms

In the question I posted, the logs were obtained with a -vvv option to SSH, I believe, though I shall verify again.

We are also using TX1s connected to the same network. But I do not face this issue with them.

Traceroute to a broadcast address is expected to fail, but the other traceroute shows a good response in a single hop to the non-broadcast address. Networking itself is doing as expected.

The “ssh -vvv” will show a lot of output, and many of the failures are expected since it is only testing for features which may not be present (e.g., different key types, but if you don’t use private keys or you use keys of a different type, then a note of failure will be listed).

The part which will become most interesting is that if you recognize when something is actually providing debug output which you consider a point in time that actual login should work. You can post the whole log (either quote it in a “code” tag, the “</>” in the title bar during thread reply creation, or attach after by hovering the mouse over the “quote” icon of an existing post and clicking on the paper clip icon), but if you get a feel for some part of the log which indicates when something unusual is going on, maybe emphasize those lines.

Hi linuxdev,

Thank you for the tip. I still was receiving only “No route to host”, and after logging in, a whole bunch of logs. I could post them but I think I will be digressing as I think I solved the issue in some other way.

The SSH log in was taking time, I thought, because the IP address of the Nano was not appearing on netdiscover. Which I thought could mean the network interfaces were not up yet or something else I do not know of. After logging in I did a systemd-analyze critical path, and realized that the service: getty@ttyGS0.service was taking a long time. This resulted in the user space taking a long time to be initialized during boot. I suspect somehow this was blocking the Nano from being shown up on the network. So I disabled and masked this service. We do not use a serial console, just remote log in via SSH from host. So I am not sure of the implications. But at the moment I can ssh within 30 seconds from power on.

Also I would be glad to know your opinion on this, if possible, which would be insightful.

It is unexpected that traceroute works, but not ssh, at least for the “no route to host” part. When you use ping or traceroute, is it correct that you are naming “192.168.2.187” for the address? Are you naming the address to the Nano differently when using ssh? If you can “ping 192.168.2.187”, or “traceroute 192.168.2.187”, then ssh should use the same route for “ssh 192.168.2.187” (or “ssh some_name@192.168.2.187”).

FYI, route is not part of ssh, this is part of the system networking (unless you’ve somehow set up a proxy which isn’t needed for direct routes). Double check that all commands are using the exact address “192.168.2.187”, and not some other named address or different dotted-decimal address.

If the ssh daemon were not running you would instead see something like “connection refused”. Route is something which is independent of the application using a route, and both ping and traceroute would show an error as well if route is bad to that address. It isn’t really possible that ping and traceroute would find a route, but ssh would not, provided they use that exact syntax (and no proxy is in the way).