JetPack 4.6 on a Xavier NX EMMC Module

I have been trying to migrate our software to Jetpack 4.6 from Jetpack 4.5, but there seem to be issues.

When loading the image to a developer module (with an SD Card), it works fine through either the SDKManager or the flash.sh script. When loading to a production Xavier NX (with EMMC), there seem to be weird issues: when loading through the SDKManager or flash.sh script, it seems to load it in a ‘minimized’ mode (hdmi does not come up beyond initial OS setup and OS setup will not ask for network connectivity, ssh through the RNDIS indicates minimized mode, Eth0 does not appear), which did not happen with the previous release. When loading an image generated (through the -G option with the flash.sh script) from a SD card version of the Xavier NX, it was unable to load the image, with the flash.sh halting when loading the image, something that working on the previous Jetpack.

I am not quite sure why this is happening, and aside from staying on Jetpack 4.5, what I else should do (rebuild the kernel with other options, etc)?

Are you talking about you are not using the pure software release from sdkmanager?

“Pure” means just using the sdkm to install. Nothing else from the sdcard image is related.

Let me clarify a bit, and start with the basic use-case:

I am using a Xavier NX Dev Kit Carrier Board(P3518 on the board, nvidia files label it as a P3509), as a carrier board to program a Xavier NX EMMC Module (P3668-0001). I use an install of SDKManager 1.6.1.8175 to flash the Xavier NX Module with Jetpack 4.6 (Rev 1).

Programming through SDKManager successfully programs the Xavier NX, and upon initial boot starts the OS configuration sequence (Username, location, language, etc.) Unfortunately, once I boot it the second time, the splash screen appears, some scrolling text, and then the monitor goes and stays black.

Once in this state, I can only get access to it through the micro usb connector on the Dev Kit via serial. This is the initial printout:

Ubuntu 18.04.5 LTS kxm ttyGS0

kxm login: kxm
Password:
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.9.201-tegra aarch64)

To restore this content, you can run the ‘unminimize’ command.

0 updates can be applied immediately.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user “root”), use "sudo ".
See “man sudo_root” for details.

kxm@kxm:~$

Which seems to indicate it is in a “minimized” form, and I can’t really get it out, since it can’t indentify the Eth0 or wan0 to connect it to the internet to install the packages.

Dmesg keeps repeating the following:

0000kxm:~$ dmesg | tail -n 40
[ 239.858174] tegra_nvdisp_handle_pd_disable: Powergated Head1 pd
[ 239.859453] tegra_nvdisp_handle_pd_disable: Powergated Head0 pd
[ 239.859624] tegradc 15200000.nvdisplay: unblank
[ 239.860671] tegra_nvdisp_handle_pd_enable: Unpowergated Head0 pd
[ 239.861281] tegra_nvdisp_handle_pd_enable: Unpowergated Head1 pd
[ 239.866551] Parent Clock set for DC plld2
[ 239.873785] tegradc 15200000.nvdisplay: hdmi: tmds rate:174500K prod-setting:prod_c_hdmi_111m_223m
[ 239.875525] tegradc 15200000.nvdisplay: hdmi: get YCC quant from EDID.
[ 239.911028] extcon-disp-state external-connection:disp-state: cable 47 state 1
[ 239.911034] Extcon AUX1(HDMI) enable
[ 239.915739] tegradc 15200000.nvdisplay: sync windows ret = 248
[ 240.267096] tegradc 15200000.nvdisplay: blank - powerdown
[ 240.302850] extcon-disp-state external-connection:disp-state: cable 47 state 0
[ 240.302905] Extcon AUX1(HDMI) disable
[ 240.327007] tegra_nvdisp_handle_pd_disable: Powergated Head1 pd
[ 240.328256] tegra_nvdisp_handle_pd_disable: Powergated Head0 pd
[ 240.328702] tegradc 15200000.nvdisplay: unblank
[ 240.329597] tegra_nvdisp_handle_pd_enable: Unpowergated Head0 pd
[ 240.329723] tegra_nvdisp_handle_pd_enable: Unpowergated Head1 pd
[ 240.335206] Parent Clock set for DC plld2
[ 240.344154] tegradc 15200000.nvdisplay: hdmi: tmds rate:174500K prod-setting:prod_c_hdmi_111m_223m
[ 240.345724] tegradc 15200000.nvdisplay: hdmi: get YCC quant from EDID.
[ 240.381205] extcon-disp-state external-connection:disp-state: cable 47 state 1
[ 240.381212] Extcon AUX1(HDMI) enable
[ 240.386719] tegradc 15200000.nvdisplay: sync windows ret = 249
[ 240.750325] tegradc 15200000.nvdisplay: blank - powerdown
[ 240.786224] extcon-disp-state external-connection:disp-state: cable 47 state 0
[ 240.786227] Extcon AUX1(HDMI) disable
[ 240.812060] tegra_nvdisp_handle_pd_disable: Powergated Head1 pd
[ 240.813707] tegra_nvdisp_handle_pd_disable: Powergated Head0 pd
[ 240.813880] tegradc 15200000.nvdisplay: unblank
[ 240.816175] tegra_nvdisp_handle_pd_enable: Unpowergated Head0 pd
[ 240.816358] tegra_nvdisp_handle_pd_enable: Unpowergated Head1 pd
[ 240.822388] Parent Clock set for DC plld2
[ 240.830238] tegradc 15200000.nvdisplay: hdmi: tmds rate:174500K prod-setting:prod_c_hdmi_111m_223m
[ 240.832840] tegradc 15200000.nvdisplay: hdmi: get YCC quant from EDID.
[ 240.867908] extcon-disp-state external-connection:disp-state: cable 47 state 1
[ 240.867914] Extcon AUX1(HDMI) enable
[ 240.873059] tegradc 15200000.nvdisplay: sync windows ret = 249

I also got the initial boot dmesg, before the continuous messages messages:

bootDmesg (59.7 KB)

The oddest part of this, is repeating this process on a Xavier NX SD Card Version (P3668-0000) seems to work fine.

Let me know if you want to know anything more. I think if we can figure out this portion, I can figure out the rest.

Hi,

Ok. Since the remote console is still working, can you try below test case and share me the corresponding log?

  1. Please just boot up the board again with monitor connected. Share me the dmesg and /var/log/Xorg.0.log

  2. Please boot up the device without monitor connected. After it is booting up, hotplug the monitor, dump me the dmesg again and also /var/log/Xorg.0.log.

The Dmesg with the monitor attached is in the previous post. Here are the rest:

hdmi_dmesg_plug_unplug.txt (6.5 KB)
xorg_hdmi_plug_unplug_capture.txt (7.6 KB)
xorg_hdmi_startup_capture.txt (7.6 KB)

[ 166.539] (II) NVIDIA GLX Module 32.6.1 Release Build (integ_stage_rel) (buildbrain@mobile-u64-5497-d3000) Mon Jul 26 12:15:20 PDT 2021
[ 166.539] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
[ 166.539] (EE) NVIDIA(0): Failing initialization of X screen 0

Looks like the graphic device fails to start due to unknown reason.

Can you share the result of “lsmod” after boot up?

As previously mentioned, the OS is starting in a “minimized” mode: the lsmod is empty:
image

I may try to completely clean my system up, and do a complete reinstall of the various tools to enable installation: I wonder if it’s somehow screwed up when working with building the kernel from source. It’s just weird to me that every other installation process works (Xavier NX SDCard with JetPack 4.6, Xavier NX EMMC with JetPack 4.5.2, etc), except Xavier NX EMMC with 4.6.

The gpu driver is a loadable module. If your lsmod is empty, it means your driver is missing…

I guess your other cases just accidentally make it work with non-empty lsmod…

Check the uname -r first. It needs to match the /lib/modules/.

Actually, your comment are making the situation worse.

For example, below statement sounds like “you didn’t customize anything”. But turns out you already built the kernel by yourself.

Programming through SDKManager successfully programs the Xavier NX, and upon initial boot starts the OS configuration sequence (Username, location, language, etc.) Unfortunately, once I boot it the second time, the splash screen appears, some scrolling text, and then the monitor goes and stays blank

You should just tell this from the beginning. That is the real important point we need to know. Honestly, the rest of info is not that useful…

I apologize, I wasn’t clear, here are the overall order of events that lead me here:

  1. Previously, I have been flashing Xavier NX SDCard and EMMC modules with your pre-built JetPack 4.5.2 kernel. No issues.
  2. In order to enable iptables mangling, I built the kernel with it enabled. During this time, I found there was a new release, 4.6, and decided to try that instead of 4.5.2.
  3. I was able to successfully build the kernel with the additions, and flash a Xavier NX SDCard module, It booted fine, kernel modules loaded, HMDI worked, iptables mangling worked.
  4. When trying to flash a Xavier NX EMMC module with the new kernel build of 4.6, the issue we are talking about appeared.
  5. During the course of this conversation, I removed the built kernel completely: all sdkmanager files, all kernel building files, etc. I tried to reflash the Xavier NX EMMC module, with nvidia’s pre-built kernel through sdkmanager. The same issue occurred. I have also tested flashed a Xavier NX SDCard the same way, The issue did not appear.
  6. I have now built Jetpack 4.5.2 with iptables mangling, It works correctly on the Xavier NX EMMC. Which means the only combinations that dont work are with the Xavier NX EMMC and JetPack 4.6: whether the kernel is built by me, or the pre-built NVIDIA kernel.

I see the same behaviour with my Xavier AGX (32GB) DevKit and JetPack 4.6.1. I am using ./flash.sh instead of SDKManager, though.

@sean.wagoner @dkreutz

Honestly, I don’t really need any story about what happened to another platform. Please just treat NX emmc and sdcard as separate issues. If you want to know why sdcard can work, we can deal with it later.

Only few simple questions here.

  1. Will “pure” sdkmanager software bring to this issue or not? “Pure” means you just let sdkmanager to download the default software and flash + boot into your emmc. Nothing get changed here. Even the kernel is not changed.

  2. If this issue only happens after you changed the kernel, can you just tell me what is your “uname -r” result and what does your /lib/modules/ look like?

Also, since there are two different users here, I would suggest @dkreutz to make sure you two are talking about the same issue. Go through the debug steps I shared in previous comments.

And just another topic here. Not sure if related.

Wayne,

To answer your questions:

  1. Yes, this issue still seems to appear when I let SDKManager do the complete install.
  2. N/A

I may have figured it out how to fix it, but why it was happening, no clue:
My devkit had an NVMe attached to it. I removed it, and now it’s working correctly.

Hi @sean.wagoner,

Ok, I understand what is going on here.

In jetapck4.6, we add nvme boot in the cboot. This makes bootloader will load the nvme first if you have one connected on it.

If this nvme has some old jetpack release, it will load the kernel from it and then use the root dev from flash.sh. Since you said you flashed the module with pure sdkmanager, the default root dev is emmc.

Which means your kernel is from nvme and your file system is on the emmc. So the kernel mismatched to the kernel modules here and made your lsmod empty.

Hope this clarifies what is going on.

Yup, that does, and that’s exactly what was happening.