Intermittent but highly reliable nfsroot failures

= Problem =
Use of nfsroot results in startup failure ~20% of the time. This is highly repeatable given the setup below.

= Setup =
(1) Jetson Nano connected to a switch via ethernet
(2) Ubuntu laptop connect to the same switch via ethernet
(3) Jetson Nano connected to the same Ubuntu laptop via serial terminal on J44 (minicom)

= Details =
In this gist: https://gist.github.com/brianthelion/ec61956bd0426d1d50f05181cce278c6

= Summary =
The laptop provides DHCP and NFS services, and logs on the laptop confirm that the nano gets the right IP and is authenticated to the appropriate NFS share. Serial terminal output on the nano’s boot process agrees with this.

[timestamp] VFS: Mounted root (nfs filesystem) on device 0:17.

From there, we see successful hand off to systemd.

Finally, 80% of the time, we get a login prompt. The other 20% we get

nfs: server 10.42.0.1 not responding, still trying

From there, a cold reboot is required.

This is highly repeatable, and independent of networks, servers, cold/warm reboot, etc. Logs don’t reveal anything obvious.

= Thanks! =
Thoughts and suggestions appreciated.

Running tcpdump on the server (laptop) shows an immediate end to NFS traffic under the failure mode. Is it possible that the NIC is going down?

I’m unable to replicate this bug using an identical configuration on a Raspberry Pi running Ubuntu 18.04.2.