Boot issue on Jetson NX

Hi,
We are using custom BSP based on Jetson L4T and facing below issue after Flashing the BSP, This BSP is otherwise working and tested:

1)On first boot, I perform the following commands (for the purpose of changing the default password of both the “nvidia” account and “root” account):
passwd
sudo passwd root

2)Next, I go into the user account settings GUI, and disable auto-login to force the user to type in a password on boot.
3)Lastly, I run the following command to reboot the system:
sudo reboot
When the system reboots, I get a stream of errors on the boot log, then the screen goes dark, and I can’t do anything further using keyboard/mouse.

The only way I found to recover from this issue, is to connect the module to the network via Ethernet, then find its IP, ssh into it, and perform sudo apt-get update, upgrade, fix a few held back dependencies, and then finally the system will boot into without issues.

The issue seems to affect Nano, Xavier NX , Xavier AGX . I also tried to reflash the modules, but the issues happens again.

Is the BSP set up to self-minimize after first boot, because I noticed this when i SSH into it:

“This system has been minimized by removing packages and content that are not required on a system that users do not log into.”

If I run the command “unminimize” to bring back the system to a state where it can be used with a monitor+keyboard+mouse, the system is then is able to boot into GUI, but the USB ports on the carrier stop working, so I can’t log in.

Hi,

Is this issue reproducible with devkit + original BSP?

Your comment sounds like you are using a customized one so I am not sure.

This sounds like a multiple issue. For example, the first one sounds like GUI desktop fails to launch. But the later one sounds like usb issue.

For that usb issue, does the usb devices still have power on or not?

Also, do you have any error log to share here?

Hi,

This is a custom BSP but we have tried it in our boards and have not observed any such issues. This issue is reported by our customer but we are not able to replicate it with our system. not on the devkit/Nvidia stock BSP as well. Can you suggest anything? You suspect any Hardware issue? They say this is happening on Xavier NX, AGX, Nano.

I have asked them to provide if they have any log files. and also asked to check the other options you have suggested.

Hi,

Ask them to use the uart console to debug.

  1. If the system is up, but only desktop fails to launch, then check dmesg and /var/log/Xorg.0.log.

  2. Check whether the usb device still has power or not. This should be easily check with optical mouse.
    If even the mouse has no power, check the dmesg.

Also, you better clarifying with your customer if they changed the BSP or rootfs.

If (1) and (2) can get recovered after reflash, then it is not hardware issue.

1 Like

I remember asking this to you before. If you don’t know the clear situation of your customer, why not you just let them directly come to this forum to ask question?

Hi,
Here is the update from customer. I have asked them to join the discussion as well.

  1. I just opened a brand-new unit one for a fresh start for testing it is L4T32.4.3 based BSP.
    dmsg_log.txt (80.4 KB)

  2. I powered the unit up using a MEAN WELL 12V 5Amp supply, connected HDMI monitor to it, USB keyboard + mouse.

  3. The desktop environment came up as expected, meaning keyboard+mouse+HDMI all works on first boot.

  4. The output of the command “cat /etc/nv_tegra_release” is:

  5. # R32 (release), REVISION: 4.3, GCID: 21589087, BOARD: t186ref, EABI: aarch64, DATE: Fri Jun 26 04:34:27 UTC 2020

  6. The output of the command “sudo jetson_clocks --show” is:

  7. SOC family:tegra194 Machine:Jetson-AGX
    … skipped …
    NV Power Mode: MODE_15W_DESKTOP

  8. Next, I connected an Ethernet cable to the unit to provide it with internet/SSH access, and the unit connects to the network no problem.

  9. On my host machine, i SSH into the unit using the following command: “ssh -X nvidia@172.31.13.103” which works, and gives me this output"

  10. nvidia@172.31.13.103’s password:

Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 4.9.140 aarch64)

** Documentation: https://help.ubuntu.com*

** Management: https://landscape.canonical.com*

** Support: https://ubuntu.com/advantage*

This system has been minimized by removing packages and content that are

not required on a system that users do not log into.

To restore this content, you can run the ‘unminimize’ command.

480 packages can be updated

376 updates are security updates.

Last login: Tue Mar 22 14:25:56 2022 from 172.31.13.135

/usr/bin/xauth: file /home/nvidia/.Xauthority does not exist.

  1. I rebooted using “sudo reboot”, which worked normally, everything is back up and running normally on graphical interface.

  2. I changed the nvidia user password to something simple using command: “passwd”, then rebooted and again, everything seems fine.

  3. I changed the root user password to the same password as the nvidia account using command “sudo passwd root” and that also surprisingly worked fine this time.

  4. I installed putty application as test using command “sudo apt-get install putty”, rebooted, and that worked fine.

  5. I ran command “sudo apt-get update”, rebooted, and that also worked fine.

  6. I went into “User Accounts” settings, and turned off “Automatic Login” in order to force the user to type in the password on boot which I always do, then rebooted the module, but the unit failed to boot back into the Graphical interface. It seems we have found one issue.

  7. I SSH’d back into the unit, and exported the dmsg log to file “dmsg_log.txt” and attached it to this email for you to take a look at.

  8. I also then exported the Xorg log and attached “Xord_log.txt” to this email for you to review.

The question now is, why does disabling Automatic login kill the graphical interface on boot? I never used “minimized” images of ubuntu before, which seems to be the base image that used for this BSP so is it related to something with that? I’ll need to find a way to disable automatic login, but keep the graphical interface alive on boot.

Hi,

Where is the xorg log?

Hello,

I appear to be the customer.

Context:
I’m using a custom carrier and BSP based on L4T R32.4.3, and the image came minimized by default. I have purchased a couple of NANO, NX, and AGX Xavier integrated units which comes pre-flashed with their BSP and their carriers, and I am experiencing the same issue on all modules i.e., the module fails to boot into a graphical interface after turning off the “Automatic Login” option in “User Accounts” (in order to force the user to type in a password on graphical boot). I have discovered that SSH works and console mode works when graphical interface fails.

Here is the xorg log:
Xorg_log.txt (17.9 KB)

Furthermore, I have not modified BSP or rootfs my self, the units came pre-flashed. But for the sake of sanity, I downloaded their BSP and successfully reflashed one of the modules, but the issue persisted upon taking the exact same steps listed above.

I have done this many times before with other carriers and BSP images, even the exact same L4T version, and never seen this issue before, but I never used minimized images before so that might be related, or perhaps something with their custom BSP/carrier is causing issues.

PS. Although it is not related directly to this issue discussed, in attempt to fix the graphical issue on one of the modules, I tried sudo apt-get update & upgrade, which resulted in the graphical interface showing up on boot, but I can’t use it because the USB ports do not have power anymore. Also, after upgrading, I can’t seem to reflash that module again, it’s just getting stuck in a flashing retry loop.

Thank you for your time.

Hi,

  1. Your xorg looks like a “legacy” from the last successful initialization of Xorg. Could you use systemctl status to check whether gdm service is still alive or not?

  2. Your usb issue sounds like separate cause. If you just do the user/root password change and create the blank monitor case, will you see the power of usb mouse gone? I mean do not run apt-get update and upgrade.

  1. systemctl status gdm.service shows that it’s loaded and actively running. I’m currently logged in via console on tty3, but when i switch to tty1, i get a black screen.

  2. The blank graphical interface issue causes by disabling automatic login does not cause the USB power to cut out. I’m currently using the USB to interact with the system via console mode on tty3.

Thanks

Ok, then usb issue is probably due to apt-get upgrade. It indicates your board is not fully compatible with original BSP.

Please make sure you didn’t run apt-get upgrade in this issue.

Ok, I’ll deal with that USB issue next with the manufacturer of the carrier. Thanks. I found it odd that their tech support told me to not run upgrade command on their modules + carriers.

Also, I confirm that I did not run upgrade prior to getting the graphical interface to fail. I achieved repeatable results on 4 brand-new pre-flashed modules (NX, NANO, 2x AGX) following the exact steps listed above where when i disable automatic login, the graphical interface fails to display on next boot.

This is what the logs for gmd.service show:

– Logs begin at Tue 2022-03-22 15:03:56 PDT, end at Wed 2022-03-23 09:21:21 PDT. –
Mar 22 15:04:00 nvidia systemd[1]: Starting GNOME Display Manager…
Mar 22 15:04:00 nvidia systemd[1]: Started GNOME Display Manager.
Mar 22 15:04:00 nvidia gdm-launch-environment][6451]: pam_unix(gdm-launch-environment:session): session opened for user gdm by (uid=0)

Hi,

  1. Can we try manually enable/disable gdm3 and see if /var/log/Xorg.0.log will be updated or not?

  2. Another thought that might not be useful… could you check if your disk space is still enough? Actually, full disk space will also cause gdm3 failure.

Also, you can compare the gdm systemctl status between the working case and the NG case.

IIRC, when gdm successfully trigger Xorg, it will have something like gdm-x-session in log, if that does not appear, then X org won’t be up.

Xorg log before restarting gdm3 (systemctl restart gmd3.service):
Xorg_log_before_gdm3_restart.txt (24.3 KB)

Xorg log after restarting gdm3:
Xorg_log_after_gdm3_restart.txt (15.9 KB)

Seems that it does get updated after restart?

Filesystem Size Used Avail Use% Mounted on
/dev/mmcblk0p1 28G 5.7G 21G 22% /
none 16G 0 16G 0% /dev
tmpfs 16G 4.0K 16G 1% /dev/shm
tmpfs 16G 21M 16G 1% /run
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
tmpfs 3.2G 8.0K 3.2G 1% /run/user/120
tmpfs 3.2G 0 3.2G 0% /run/user/1000

There appears to be a difference in gdm3. service status/logs, and it’s referencing auto login as well.

NG case:
nvidia@nvidia:~$ systemctl status gdm3.service
● gdm.service - GNOME Display Manager
Loaded: loaded (/lib/systemd/system/gdm.service; static; vendor preset: enabl
Active: active (running) since Wed 2022-03-23 10:56:24 PDT; 1h 19min ago
Process: 13733 ExecStartPre=/usr/share/gdm/generate-config (code=exited, statu
Main PID: 13761 (gdm3)
Tasks: 3 (limit: 4915)
CGroup: /system.slice/gdm.service
└─13761 /usr/sbin/gdm3

Mar 22 15:04:00 nvidia systemd[1]: Starting GNOME Display Manager…
Mar 22 15:04:00 nvidia systemd[1]: Started GNOME Display Manager.
Mar 22 15:04:00 nvidia gdm-launch-environment][6451]: pam_unix(gdm-launch-environment:session): session opened for user gdm by (uid=0)

Working Case:
nvidia@nvidia:~$ systemctl status gdm3.service
● gdm.service - GNOME Display Manager
Loaded: loaded (/lib/systemd/system/gdm.service; static; vendor preset: enabled)
Active: active (running) since Sun 2018-01-28 07:58:20 PST; 4 years 1 months ago
Process: 6386 ExecStartPre=/usr/share/gdm/generate-config (code=exited, status=0/SUCCESS)
Main PID: 6441 (gdm3)
Tasks: 3 (limit: 4915)
CGroup: /system.slice/gdm.service
└─6441 /usr/sbin/gdm3

Jan 28 07:58:19 nvidia systemd[1]: Starting GNOME Display Manager…
Jan 28 07:58:20 nvidia systemd[1]: Started GNOME Display Manager.
Jan 28 07:58:20 nvidia gdm-autologin][6550]: gkr-pam: no password is available for user
Jan 28 07:58:20 nvidia gdm-autologin][6550]: pam_unix(gdm-autologin:account): account nvidia has password changed in future
Jan 28 07:58:20 nvidia gdm-autologin][6550]: pam_unix(gdm-autologin:session): session opened for user nvidia by (uid=0)

Let’s try to clarify whether this is only to Gnome or even the other X application.

Please disable gdm and directly run initx and see if there would be a screen showing up on the monitor.

gdm disabled using “servicectl stop gdm.service” command, X session started using “startx” command, and a graphical interface came up:

I was able to control the X session using mouse and keyboard, and confirmed that gdm.service was actually stopped just to be sure.

Hi,

If gdm is the cause of this issue, I would say this seems not related to platform.

Have you ever tried to use jetson devkit to reproduce this issue?