TX1: Building L4T kernel on device - failed to start NVPMODEL service

Hi,

I’m trying to build the L4T kernel for the Jetson TX1. I use L4T 32.5.1 as a base, flashed with JetPack 4.5.1.
Since building the kernel is taking up a bit of space, I am using an SSD as the boot disk (I think it still uses eMMC to boot but uses the SSD as root filesystem or something similar).

I have followed two guides, first JetsonHacks github: GitHub - jetsonhacks/jetson-linux-build: Tools to build the Linux kernel and modules on board Jetson Developer Kits and write-up: Build Kernel and Modules – NVIDIA Jetson TX1 - JetsonHacks, as well as the kernel customization page in the L4T docs: https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/kernel_custom.html# (tried to understand how to do this on the Jetson instead of cross-compiling).

Despite trying a couple of different approaches, I always end up with the same issue: the device doesn’t boot when I’m using the compiled image, and the boot log gives the same message each time:

[FAILED]  Failed to start nvpmodel.service. 
See 'systemctl status nvpmodel.service' for details

which when using the serial console prints out the following:


$ systemctl status nvpmodel.service
nvpmodel.service - nvpmodel service
   Loaded: loaded (/etc/systemd/system/nvpmodel.service; enabled; vendor preset:
   Active: failed (Result: exit-code) since Sat 2021-10-23 17:09:35 CEST; 18s ag
  Process: 4859 ExecStart=/usr/sbin/nvpmodel -f /etc/nvpmodel.conf (code=exited,
 Main PID: 4859 (code=exited, status=255)

okt. 23 17:09:35 oskar nvpmodel[4859]: NVPM ERROR: Error opening /sys/devices/gp
okt. 23 17:09:35 oskar nvpmodel[4859]: NVPM ERROR: failed to read PARAM GPU: ARG
okt. 23 17:09:35 oskar nvpmodel[4859]: NVPM ERROR: Error opening /sys/devices/gp
okt. 23 17:09:35 oskar nvpmodel[4859]: NVPM ERROR: failed to write PARAM GPU_POW
okt. 23 17:09:35 oskar nvpmodel[4859]: NVPM ERROR: failed to set power mode!
okt. 23 17:09:35 oskar nvpmodel[4859]: NVPM ERROR: optMask is 2, no request for 
okt. 23 17:09:35 oskar systemd[1]: Starting nvpmodel service...
okt. 23 17:09:35 oskar systemd[1]: nvpmodel.service: Main process exited, code=e
okt. 23 17:09:35 oskar systemd[1]: nvpmodel.service: Failed with result 'exit-co
okt. 23 17:09:35 oskar systemd[1]: Failed to start nvpmodel service.

Via the UART console, dmesg spits out this related to the screen:

[   77.774899] Extcon AUX1(HDMI) enable
[   77.787674] tegradc tegradc.1: sync windows ret = 247
[   78.090679] tegradc tegradc.1: blank - powerdown
[   78.137675] extcon-disp-state extcon:disp-state: cable 47 state 0
[   78.137677] Extcon AUX1(HDMI) disable
[   78.157952] tegradc tegradc.1: unblank
[   78.214961] tegradc tegradc.1: nominal-pclk:148500000 parent:148500000 div:1.0 pclk:148500000 147015000~161865000
[   78.215027] tegradc tegradc.1: hdmi: tmds rate:148500K prod-setting:prod_c_hdmi_75m_150m
[   78.216005] tegradc tegradc.1: hdmi: get YCC quant from EDID.
[   78.254551] extcon-disp-state extcon:disp-state: cable 47 state 1

Looking closer into systemctl, I can see that the following services failed as well:

systemctl list-units --failed
  UNIT                         LOAD   ACTIVE SUB    DESCRIPTION                
��● nvpmodel.service             loaded failed failed nvpmodel service           
��● nvzramconfig.service         loaded failed failed ZRAM configuration         
��● systemd-modules-load.service loaded failed failed Load Kernel Modules   

This happens with default and modded config.

However, when I use the UART console to select which image to boot from, I can log in as usual and see that I’m booted into the correct image there (uname -r is showing the expected label).

I haven’t been able to find any reference to someone else experiencing a similar issue.

So my questions are:

  1. Any ideas why the nvpmodel service cannot be started? afaik it’s some nvidia power management thing, but I have not been able to figure out how to compile it. In the config, this looks like it’s a feature built into the kernel, and not a module.
  2. Is this build flow deprecated? Is cross-compilation only supported?

Thanks,
Oskar

Native compile works well if you have enough disk space. Much of the error I see could be from missing kernel features, which in turn tends to be about kernel configuration. Tell me about configuration:

  • How did you configure initially for a compatible configuration?
  • Did you install both kernel Image file and modules, or just some part of that?
  • When you did install a kernel and/or Image file, what method did you use?
  • Do you still have an original working kernel you can boot to? For example, via an extra extlinux.conf boot entry selected with serial console cable during boot.
  • Are you certain the kernel source version you are using is from the same release? For release see “head -n 1 /etc/nv_tegra_release”.

FYI, when I build natively (or even cross-compile) I always use empty directories in a temp location for all output. If you’ve built directly within the kernel source, then you’ll want to “sudo make mrproper” to remove all customization from the original source, and to have all output in those temp locations. Those temp locations can be owned by your regular user, and the source (since it won’t be modified after “sudo make mrproper”) could be readable all, but have write permission removed for anyone except root (or sudo).

What you see below is a “short recipe” for native compile using some temporary output locations, but I think configuration is more important than these steps, and if you have a copy of a working stock kernel’s “/proc/config.gz”, then odds of success go up. Here is an example native build recipe:

# --- Setting Up: -------------------------------------------------------
# DO NOT BUILD AS ROOT/SUDO!!! You might need to install source code as root/sudo.
mkdir -p "${HOME}/build/kernel"
mkdir -p "${HOME}/build/modules"
mkdir -p "${HOME}/build/firmware"

export TOP="/usr/src/sources/kernel/kernel-4.9"
export TEGRA_KERNEL_OUT="${HOME}/build/kernel"
export TEGRA_MODULES_OUT="${HOME}/build/modules"
export TEGRA_FIRMWARE_OUT="${HOME}/build/firmware"
export TEGRA_BUILD="${HOME}/build"

# --- Notes: ------------------------------------------------------------
# It is assumed kernel source is at "/usr/src/sources/kernel/kernel-4.9".
# Check if you have 6 CPU cores, e.g., via "htop".
# If you are missing cores, then experiment with "sudo nvpmodel -m 0, -m 1, and -m 2".
# Perhaps use "htop" to see core counts.
# Using "-j 6" in hints below because of assumption of 6 cores.
# -----------------------------------------------------------------------

# Compile commands start in $TOP, thus:
cd $TOP

# Do not forget to provide a starting configuration. Probably copy of "/proc/config.gz",
# to $TEGRA_KERNEL_OUT, but also perhaps via:
make O=$TEGRA_KERNEL_OUT nconfig

# If building the kernel Image:
make -j 6 O=$TEGRA_KERNEL_OUT Image

# If you did not build Image, but are building modules:
make -j 6 O=$TEGRA_KERNEL_OUT modules_prepare

# To build modules:
make -j 6 O=$TEGRA_KERNEL_OUT modules

# To build device tree content:
make -j 6 O=$TEGRA_KERNEL_OUT dtbs

# To put modules in "$TEGRA_MODULES_OUT":
make -j 6 O=$TEGRA_KERNEL_OUT INSTALL_MOD_PATH=$TEGRA_MODULES_OUT

# To put firmware and device trees in "$TEGRA_FIRMWARE_OUT":
make -j 6 O=$TEGRA_KERNEL_OUT INSTALL_FW_PATH=$TEGRA_FIRMWARE_OUT

Please note that it is very useful to actually build a kernel which is a 100% exact match of the original, and install that just to see if the install steps were correct. Many people mistakenly think that flashing is the only way to install a new kernel, but in most cases (except when security fuses are burned) file copies are easier, and using a new entry in extlinux.conf is better than simply overwriting your known original working kernel.

Also, I configure with nconfig because it has a convenient symbol search feature, but you can use any config editor. You can directly edit the “CONFIG_LOCALVERSION”, but this is only because it has no dependencies. Configuration editors are needed for most features as a way to guarantee that not only will your changes go in, but also dependencies which change because of that change will be correct.

FYI, to see your actual list of devices on your filesystem, you might run:
df -H -T -t ext4
(assumes all filesystems you work with on a real disk are type ext4)

Thanks for a detailed answer!

  • How did you configure initially for a compatible configuration?
    –I simply used the default config (i.e. make ARCH=arm64 O=$TEGRA_KERNEL_OUT tegra_defconfig ). I didn’t change any configs just to get the flow up and running, and will be making some mods later on; Enabling KVM.
  • Did you install both kernel Image file and modules, or just some part of that?
    – Now this part I’m a bit unsure about, I could have messed it up when trying to build using the NVIDIA docs. I will retry using your pointers and report back.
  • When you did install a kernel and/or Image file, what method did you use?
    – Two different methods: using the Jetsonhacks scripts blindly (my fault for not reading the scripts) and just plain copy, although I tried to follow the Nvidia docs but those seemed to mostly cover cross-compilation if I’m not mistaken. If that’s what you mean? I used a temporary directory to which I compiled the stuff though, but the installation part I might have got wrong.
  • Do you still have an original working kernel you can boot to? For example, via an extra extlinux.conf boot entry selected with serial console cable during boot.
    – I do. I have two working copies of the original image: One that boots into the eMMC and one that boots into the SSD as the default filesystem. And then I am playing around with the compiled kernel in a third boot option that uses the SSD as storage. I choose which image to boot from using the serial console.
  • Are you certain the kernel source version you are using is from the same release? For release see “head -n 1 /etc/nv_tegra_release”.
    – Yes. According to /etc/nv_tegra_release it’s R.32 rev 5.1, which also corresponds to the documentation for the Jetpack version used when flashing the device.

Diffing the config file in /proc/config and the config file I have used during both building methods reveal that only the local version string (CONFIG_LOCALVERSION) is different, so that should be promising.

So I think I should be able to at least build the kernel and modules with fairly high confidence, but I just have a couple of questions:

  • When installing the image, that has to be done on the eMMC since booting from that. But can the modules be installed on the SSD, does those need to be installed on the eMMC as well?
  • When it comes to installing the kernel, I understand installing the image, but I don’t think I got a grip on the rest of it, and haven’t really found any documentation related to installing the modules etc. Any pointers there?

Was this performed on a native build?

If so, then this is perhaps the reason for failure. You should never name the ARCH=arm64 on a native build. The above causes changes (not sure if it should or should not cause changes, but it does). When building natively, can you try again with a clean build which leaves out the ARCH in all build commands?

It sounds like mostly what you are doing is correct, except that explicitly naming ARCH when natively compiling is likely to cause a boot failure.

Do beware that if CONFIG_LOCALVERSION differs, then modules will not be found. You would then have to build and install all modules as well. Unless you have a reason to change CONFIG_LOCALVERSION (and thus change “uname -r”) this should be made as a match. If indeed you really want a new “uname -r” in order to search for modules at a new location, then instead of leaving it blank you should create a new name, e.g., “-tegra_test” as an example.


About what follows…

Beware that most of the example which follows is unnecessary if every driver is integrated directly into the kernel Image file, and does not exist in the form of a module. However, there are reasons why modules exist, so I would not normally consider making every feature as an Image-integrated non-module to be practical.

The explanation below is very long, but this is something worth documenting for other people looking for help on kernel module issues. Sorry, this all surrounds module loading, but there are so many questions related to this in the forums it is worth posting this. This probably goes far beyond answering your particular question, and will also likely still leave you with questions. Keep reading at your own risk of exhaustion boredom! This is designed not as a recipe for using new rootfs media, but is instead the kind of detail which can be used to debug or engineer new rootfs media (concentrating on module load requirements…official docs give recipes for alternate rootfs media types).


For your case it is “simplest” to install modules to the eMMC and simultaneously to SSD (then your kernel won’t care if any given module loads from SSD or eMMC…it would load the same exact content from either medium). Below I’ll talk about different filesystem types simply because it is a good way to demonstrate module loading issues. You’re not using an odd filesystem type, but it is easier show why and how module load (a simple topic) is greatly complicated when split into two different pieces of hardware (e.g., eMMC versus SSD, but it could just as easily be ext4 versus the XFS filesystem types).

Understanding module loading is fairly simple, but the different ways people use to mount the rootfs complicate module loading (an SSD as rootfs is one of those cases). If you don’t have an initial ramdisk (initrd, a kind of special “utility adapter”), then at the moment the kernel loads some modules might also try to load from whatever is mounted at that instant in time (“some” because modules load dynamically upon some dependency being detected). The subtle thing about that is that not all modules will always load and some modules might load at a later time after your new partition is mounted and overriding the original module directory. If both “old” (the eMMC module directory) and “new” (the SSD) have the same content, then you won’t notice a difference. Replacing a module with an exact duplicate at another location is seamless. If you know which modules load immediately, and put those on eMMC, and then place only the later loading modules on SSD, then this too works…but then you must know which loads when.

So a question arises: When would a module load prior to the new SSD filesystem being mounted if the SSD mounts quite early? The answer is that sometimes modules are needed for the kernel to access part of the hardware, including some partitions which might have a different filesystem type. One Linux filesystem type is XFS, so imagine your eMMC is formatted for XFS…then the kernel Image file itself would fail to load because the bootloader itself cannot understand XFS (the bootloader itself does not have an XFS driver, it only has ext4 and initrd drivers). In what follows I’m assuming the XFS driver is in the format of a module, and not built into the actual Image file for the Linux kernel. Ok, so make the eMMC ext4 for the “/boot” kernel Image location so the bootloader can read it (the bootloader understands ext4), and thus the “/boot” can be used to read the kernel and place it in RAM at the right location. The “/boot” content using ext4 is pretty much mandatory unless you use an initrd, but will mention that later. This is a limitation of the bootloader, and not a limitation of the Linux kernel.

The bootloader itself has its own filesystem drivers, and these are required if and only if the bootloader is reading from a formatted partition, e.g., ext4. ext4 is the default and you are guaranteed the bootloader can read this filesystem type, and thus can read “/boot/Image” on an ext4 filesystem. This is how the kernel is read unless the kernel is in a partition. If the kernel is read from a partition, then it is binary data with no underlying filesystem, and thus reading partitions by the bootloader only needs a driver for the controller, e.g., a SATA or eMMC driver (the eMMC controller driver exists within the bootloader, and some other external media drivers also exist, e.g., SATA over USB). You’re using an ext4 filesystem though, so you can ignore loading from a partition (someone who has burned security fuses must load through a signed partition, so there is a case when “/boot” load options go away).

The bootloader never reads modules. This is performed by the kernel which is currently running. The kernel was placed in RAM and execution transfers to the kernel (the bootloader has as its one goal to overwrite itself and die by bringing the kernel to life). If the code for reading the filesystem is integrated into the kernel Image, then the kernel never has an issue with that filesystem type. If the code for any hardware access or understanding a filesystem type is in the form of a module, then the module must be loaded prior to accessing the hardware or filesystem type. If you have a filesystem driver for reading the filesystem type of the module directory is not already loaded, then the module to read the filesystem type cannot be read. This becomes a “catch 22” or “chicken and the egg” dilemma. The module can’t be loaded because the filesystem type is not understood, and the ability to understand that filesystem type is in the module which cannot be loaded.

You could substitute “drivers for your SSD” and “drivers for eMMC” in the above. There still remains the central point of the timing of when drivers are needed versus where they are found being available (i.e., there isn’t much difference between filesystem drivers and hardware drivers when it comes to “not working because they don’t yet exist”). I’ll continue though with the filesystem type as the example.

Not all drivers are needed for boot. For example, suppose you have an audio driver for some nice 7.1 surround sound audio card. This has nothing to do with the chain of boot to Linux kernel load to ability to load modules. Such a module could exist on just the SSD if you desire.

As an example, if you were to integrate a non-module format sound driver into the Linux kernel, then the kernel size would grow significantly, but you’d never have to worry about having the driver available…one could play sounds and audio even before the rest of the operating system completes loading (all modules could be missing and the integrated driver would still work).

If you happen to be building custom audio appliances and you are selling and servicing these units yourself, then integrating directly (versus module format) would be a good tradeoff since you’d always want that content anyway (there would be no “unnecessary” bloat). Well, until there is a patch. Then you’d have to replace the entire kernel and not just a module (there is a tradeoff if your module will evolve over time…then the module and initrd start looking more attractive versus integration). If such a driver is being made available in Linux in general, but only 1% of the users have this hardware, then 99% would have a bloated kernel for no reason. There is a similar issue that much of the embedded system hardware is custom for that particular hardware. Should the driver be made available by the mainstream Linux distribution, then almost certainly you’d only want this built as a module to create the driver as an option (the Realtek ethernet drivers are an example…they’re common, but not needed for many people). ext4 is typically not a module format because an extraordinary percentage of Linux systems load this or an initrd at the start. ext4 is more or less the gold standard of initial filesystem types on any generic Linux install. Other filesystem types tend to be installed as a module.

Some features cannot be a module. This is usually for invasive content, e.g., virtual memory swap is such a case which cannot be a module format. Some features only exist as a module, and don’t have an integration option (which could be for any number of reasons, e.g., not being a GPL license or being experimental, or the author simply did not take the time to write compatibility in integrated format).

Here’s something which might seem to be a complication, but is possibly a simplification in your case: The initrd. The initrd is the initial ramdisk, and is just a compressed tar archive which is unpacked into RAM with a very simple “tree” structure which can be treated as a filesystem. The initrd is very streamlined and lacks many of the abilities of a real filesystem, but it has no trouble being read as files in directories. When the initrd is loaded instead of the actual ext4 hardware device the entire filesystem is in RAM instead of on disk; this leads to one very important observation: The kernel Image is still the file which is read from “/boot” (if security fuses are not burned and extlinux.conf says to use the Image file in “/boot”), but all of the content surrounding the kernel (including init and kernel modules) are on on the initrd filesystem. There is only one kernel Image, but other content might exist in more than one place. If ext4 or XFS filesystems are not part of the kernel Image, then those filesystems will still work if those modules are in the initrd module location. The kernel won’t care that the filesystem is in RAM and not on a disk. Putting the absolute minimum dependencies for modules in the initrd does the job quite well. Your case might not require an initrd, but if something triggers the need to load a module in order to use the SSD, then the initrd having the module is the simplest solution.

Once those modules are loaded from the initrd (if using one) the init process will eventually tell the kernel to load any module type drivers needed for any kind of disk (e.g., SSD or NVMe if not integrated into the Image). It won’t matter if the driver is for SATA over USB, or SSD on an m.2 slot, or any other crazy scheme (e.g., iSCSI over gigabit which would require both ethernet and iSCSI modules to be loaded into the initrd prior to mounting the filesystem). Should the modules be available as a module format in the initrd the kernel is guaranteed to be able to find and load these prior to using that hardware or filesystem type requirement occurs. The kernel simply mounts the new rootfs of either the eMMC or SSD on some temporary mount point (the initrd modules tell the kernel how to do this), and then performs a pivot_root type operation to transfer the concept of “root of the filesystem” to this new mount point (the temporary mount point is renamed as “/”). That new device becomes the rootfs, and the life of the initrd goes away (RAM holding the initrd is released). Like a bootloader the initrd has as its only goal to eventually overwrite itself some other file system. Then your SSD becomes the rootfs and the initrd deallocates. This is your adapter between things required to boot using modules and modules which are not yet available. This adapts module requirements between two points in time during boot (the timing of module availability is altered).

Do note that the old eMMC “/boot” content still exists since it is on a disk. Whether or not you see that content depends on what gets mounted after the pivot_root. If the “/etc/fstab” of the SSD says to mount some part of the eMMC on the new pivot_root/boot” (owned by SSD) mount point, then “/boot” remains with the content of the eMMC (fstab told mount to place the eMMC version there and to hide any SSD version of “/boot”). On the other hand, if you just pivot_root and never mount the eMMC on “/boot”, then the “/boot” content is entirely from the SSD. Should the “extlinux.conf” of the two partitions differ, then it would have been read from the eMMC version, and changes to “extlinux.conf” on the SSD will have no effect on boot (there is an exception which I’ll mention later, but this is the “simple” case). This is why you can have two extlinux.conf files and edits might not do what you think. It is common practice on a PC to have “/boot” and “/” (the rootfs) on separate partitions, but this is not used (so far as I know) on Jetsons.

Note that sometimes device tree is in “/boot”, and that device tree content is used for each driver at the moment the driver loads as if the tree is an argument passed to the driver. However, if a device tree is read into RAM (as reflected in “/proc/device-tree”), consider that what is loaded might differ from the tree in “/boot” if the tree was loaded with one such tree, followed by changing the “/boot” between that of eMMC and SSD. The “/proc/device-tree” content is the definitive source of knowing what was actually loaded, not the one on the filesystem.

When a Jetson is flashed it has non-rootfs/non-filesystem content which is more or less a pointer to finding “extlinux.conf” (this pointer is early boot stage binary data, and is not part of the rootfs). There are some macros which might also be associated whereby more than one extlinux.conf location is searched, and one location has a higher priority than other locations (the U-Boot console is useful for examining those macros for models using U-Boot and not just CBoot). I’m not saying the below flash command works for all hardware, or even this hardware, but it is an illustration that the “priority” extlinux.conf location can be changed via flash parameters:

sudo ./flash.sh jetson-tx1 mmcblk0p1
sudo ./flash.sh jetson-tx1 sda1

The above concept demonstrates how a priority device might be determined at the moment of flash. What it doesn’t make obvious is that the boot content itself (prior to Linux loading, excluding extlinux.conf) will come from eMMC binary content (or on SD card models of Jetson this is from the QSPI memory on the module storing this “pointer” and macro content). The extlinux.conf itself might name another device, e.g, one might load the eMMC version of extlinux.conf, and that extlinux.conf may have an entry pointing at sda1 and hand off to a new extlinux.conf, or the pointer might entirely skip the eMMC extlinux.conf and directly load the sda1 version of extlinux.conf. It just depends on how flash was set up and the order in which boot media are accessed, plus any twist the extlinux.conf entry adds. Thus there may be one extlinux.conf determining the final boot content, or two such extlinux.conf files, and the one you edit or look at might be right or wrong depending on which part of boot you are speaking of. If editing an extlinux.conf has no effect, then perhaps you are editing the wrong one. Perhaps one has a different module requirement than the others. If this occurs, then module load media location is probably also in question.

Can you believe all of that is to say “it all depends as to whether modules need to be on the SSD or the eMMC or both”? It isn’t a recipe, but it should help the patient find out what is wrong with their alternate boot media boot process. Don’t forget that the official docs offer recipes for alternate boot media.

I think I made several errors, one of them being changing the CONFIG_LOCALVERSION to another string than “tegra”, which caused the device to not find the modules.
So I remade everything, got stuck, read up a bit and I finally got it to work, with my test CONFIG_LOCALVERSION variable.
A big thanks, I feel that I understand the process which this is done a lot more now, and your explanation as to the modules needing to be installed to the eMMC and SSD was needed. I simply installed modules on both disks, but had you not explained it I probably would have assumed that it’s fine with just installing on the SSD and got more confused.

I needed to do add modules_install and firmware_install to actually install the modules and firmware in the temporary directories.

# To put modules in "$TEGRA_MODULES_OUT":
make -j 4 O=$TEGRA_KERNEL_OUT INSTALL_MOD_PATH=$TEGRA_MODULES_OUT modules_install

# To put firmware and device trees in "$TEGRA_FIRMWARE_OUT":
make -j 4 O=$TEGRA_KERNEL_OUT INSTALL_FW_PATH=$TEGRA_FIRMWARE_OUT firmware_install

After copying the modules (to /lib/modules/ on both SSD and eMMC) and image, I was able to boot my build, using the default setup ( given by /proc/config.gz). It seemed as if the firmware didn’t do anything new, it just built the same stuff as was already there, albeit a subset of what was present in the system firmware directory. I verified this via uname -r.

Huge thanks linuxdev, not at least for the help but the explanation regarding module loading as well! initrd is something that I’ll read more about, seems interesting! I definitely have edited the “wrong” extlinux.conf file a few times before understanding that it wasn’t read from the SSD :)

Btw, even the hyphen has to be in CONFIG_LOCALVERSION. Example:
CONFIG_LOCALVERSION="-tegra"

Once people figure out the need for CONFIG_LOCALVERSION it isn’t unusual to have forgotten that pesky hyphen.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.