AGX Xavier NFS Boot: IP-Config: Retrying forever (NFS root)

I am attempting to do an NFS boot on AGX Xavier running Jetpack 5.1. I found the following documentation for R35.2.1:

https://docs.nvidia.com/jetson/archives/r35.2.1/DeveloperGuide/text/SD/FlashingSupport.html#flashing-for-nfs-as-root

This section contains the following note:


To create a bootable NFS root file system, you must first:

    Perform the process described in Step 1: Set Up the Root File System <https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/rootfs_custom.html#wwpID0E0JG0HA>

    Perform the process described in Configuring NFS Root on the Linux Host <https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/getting_started.html#wwpID0E0CC0HA>

However, when I visit the URLs mentioned here, I get redirected to R34.1.1’s documentation.

After browsing elsewhere for a guide for NFS boot on AGX Xavier, I managed to find out that these links refer to the topics " Setting Up the Root File System" and " Configuring NFS Root on the Linux Host" respectively. However, I did not find any such topics in the R35.2.1 documentation.

I did however find the following (which are both for 32.7.5 release):

https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3275/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/rootfs_custom.html

https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3275/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/getting_started.html#wwpID0E0CC0HA

So assuming that the steps mentioned here are still valid, I went through with it and implemented the steps. Here is my full workflow.

  1. Download the NVIDIA provided sample root file system:
    Tegra-Linux-Sample-Root-Filesystem_<release_type>.tbz2

(Note: I actually used the one already downloaded when I had previously downloaded Jetpack source package files).
This was the file: “Tegra_Linux_Sample-Root-Filesystem_R35.2.1_aarch64.tbz2”

  1. Extract the file system to a new directory (<nfs_dir> represents the directory where the file system was extracted):
mkdir <nfs_dir>
cd <nfs_dir>
sudo tar -jxpf Tegra_Linux_Sample-Root-Filesystem_R35.2.1_aarch64.tbz2
  1. Set LDK_ROOTFS_DIR environment variable to point to <nfs_dir> and then apply binaries:
export LDK_ROOTFS_DIR="<nfs_dir>"
sudo -E <Linux_for_Tegra>/apply_binaries.sh
  1. Install NFS components on host machine:
    sudo apt-get install nfs-common nfs-kernel-server

  2. Modified /etc/exports file on host so that it contained an entry for my nfs file system. The /etc/exports file contains the following:
    <nfs_dir> *(rw,nohide,insecure,no_subtree_check,async,no_root_squash)

  3. Restart kernel server:
    sudo /etc/init.d/nfs-kernel-server restart

  4. Set permissions for <nfs_dir>:

sudo chmod 755 <nfs_dir>
sudo chown root.root <nfs_dir>
  1. Skip OEM Config via l4t_create_default_user.sh script (I changed the ‘rfs_dir’ variable in the script to point to <nfs_dir>:
    Linux_for_Tegra/tools/l4t_create_default_user.sh -u <my_username> -p <my_password> -n <my_hostname>

  2. Export the root point:
    sudo exportfs -a

  3. Disabled firewall:
    sudo ufw disable

  4. Connected ethernet cable and USB-C cable between Xavier and the host PC. Then put the board in recovery mode and entered the following to flash the Xavier:
    sudo ./flash.sh -N <host_ip_address>:<nfs_dir> jetson-agx-xavier-devkit eth0

The flash process went fine, and I get the message:


*** The target t186ref has been flashed successfully. ***
Make target nfsroot(<my_host_ip>:<nfs_dir>) exported on the network and reset the board to boot

After rebooting the Xavier, I get the following message repeatedly on terminal:

[ 572.877089] hwmon hwmon3: temp1_input not attached to any thermal zone
[ 576.036648] nvethernet 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[ 576.046751] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 576.066730] Sending DHCP requests ...... timed out!
[ 655.516021] Trying to unregister non-registered hwtime source
[ 655.516040] nvethernet 2490000.ethernet eth0: Link is Down
[ 655.525683] IP-Config: Retrying forever (NFS root)...

The complete UART log is attached.
uart_log.txt (75.0 KB)

What did I do wrong? Are the steps I followed valid for R35.2.1? Did I miss anything?

Regards,
Sana Ur Rehman

It looks like rootfs setup is good, along with flash itself. There may be a router issue though. Your Jetson is not obtaining an IP address, which is what the DHCP request is for. There was no DHCP response (the router’s responsibility).

Does your router have security set up such that it only responds under some circumstances, e.g., maybe it requires adding a MAC address to the router for authorization?

Hi @linuxdev . Thanks for the reply. I have a point-to-point Ethernet connection between my Jetson AGX and the host machine. There is no router/switch involved.

I am not very familiar with networking protocols, so I am curious as to how the Jetson would obtain an IP address? It is currently connected to the host only via a point-to-point ethernet link. I doubt there is anything configured in the host which would allow the Jetson to obtain an IP? What is the standard way for a Jetson to obtain an IP in the case of NFS boot?

Could it also be a case of IPv4 and IPv6? The UART log shows ethernet link is up using IPv6. However, when flashing the Jetson, I used IPv4 address of the host in the flash command. Could the problem be related to this?

Regards,
Sana Ur Rehman

I tried flashing using the link-local IPv6 address of the host. However, this gives me the error:

Error: Invalid nfsroot(<host's IPv6 address>:<nfs_dir>)

So it seems that the flash.sh script doesnt support a IPv6 address. So not sure what the problem is. The host is a Ubuntu machine with a static IPv4 address.

It seems that I needed to set up a DHCP server like here Problem of kernel and DTB from TFTP and rootfs from NFS - #52 by WayneWWW (might be worth mentioning it in the official NVIDIA docs, so that newbies like me have some hint about what to do). It seems like a pretty important step, which I didnt find any mention of in the NVIDIA docs.

Moving on, I managed to boot using NFS boot. The HDMI display goes blank after booting up, but I can get to the login prompt in the UART terminal. After logging in from the UART terminal, I tried to do some simple operations. However, there seems to be some permissions issue, because I cannot perform even the simplest operations like creating a directory. I always get ‘Permission Denied’. Using sudo gives me the following error:

sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set

Is there any missing step in my initial comment that I needed to perform to get correct permissions for the mounted NFS file system? Any help would be appreciated. Thanks!

Update: I took a fresh start, and set up the NFS boot procedure from scratch. This time it worked flawlessly! So I can only assume I messed up the permissions settings somehow in the last attempt. So problem solved! Thanks for the help @linuxdev . Appreciate it!

A lot of people discover this problem. You figured it out, but I’ll answer anyway so people understand what goes wrong with sudo and NFS.

The issue with sudo is related to the type of filesystem used. NFS complicates it, but for the moment, consider what the NFS itself is being supported by (the partition or device on the NFS side).

It is perhaps easiest to start with an example. sudo is a security tool, and it has a number of properties associated with how the file (the sudo command itself is a file) is stored. If we were to use a non-Linux filesystem type, e.g., NTFS or VFAT, then those extended file attributes do not exist, and it is guaranteed that having sudo on one of those filesystem types will fail. There is no possibility of success if the sudo command is set wrong, and only Linux filesystem types are capable of setting this correctly. A Linux filesystem type could of course be set wrong, and sudo would fail, but it is possible on a Linux filesystem for this to succeed.

Some illustrations:

  • If you unpack a filesystem for use in flashing or other use, and you do not unpack it as user root (such as via sudo), then those special permissions are lost. sudo would then fail.
  • If you use sudo to unpack a filesystem, but the destination is a VFAT or NTFS filesystem, then you are guaranteed to fail.
  • If you use sudo to unpack or create a filesystem, and it is unpacked to a Linux filesystem type, but if the original is not archived with proper permissions, then you would again find sudo fails.
  • If you have a valid Linux filesystem type on an NFS server, and if this was created correctly such that sudo can work, but if the export squashes root file permissions, then you are guaranteed to fail (with data one would typically squash/translate anything owned by root to an end user; nosquash is for keeping root authority, which is considered a security risk).
  • If you import an NFS volume onto another system, and if that import is itself squashed, then this will fail sudo.
  • If all is correct on the NFS host PC, and exported correctly, and imported without translating root to an end user ID, then all should function with sudo.
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.