Failed to start Load Kernel Modules after using custom Image

Attempting to use a custom boot Image on the Xavier NX, I get the errors:

Failed to start Load Kernel Modules
Failed to start nvpmodel service

I don’t know what this could be caused by, so would appreciate any pointers to what the problem might be.

My Image is a slightly modified version of the original where I changed the uvcvideo driver in an attempt to get a second USB camera working. I followed the instructions in this tutorial in order to create the custom image. As I am cross-compiling, I change Image and the dtb folder in /boot after I successfully start up with a newly flashed SD card, using the Image and dts I get from the compilation on a different Linux machine. After rebooting, the system shows the above errors and does not boot completely. I tried the process several times to no avail.

A typical reason for failing to load modules is if the modules do not exist where finding them is expected. In your URL look closely at this quote:
export LOCALVERSION=-tegra

Note that the “uname -r” command response uses the kernel’s base version, and appends to this the LOCALVERSION to get the final answer, For example, if the kernel is “4.9.140”, and at the moment of compile LOCALVERSION was “-tegra”, that the “uname -r” will be “4.9.140-tegra”. Modules are searched for here:
/lib/modules/$(uname -r)/kernel/

If the modules are not there, then module load will fail.

It is also possible that some kernel source configurations will append a “+” in “uname -r” in some cases. Perhaps a boot log would indicate if the “uname -r” is correct or not. Can you provide a serial console boot log? See (Nano and NX serial console information should be the same):
https://www.jetsonhacks.com/2019/04/19/jetson-nano-serial-console/

I actually tried both export LOCALVERSION=-tegra and export LOCALVERSION="-tegra" with the same result. My understanding is the two are the same.

I don’t have a USB-TTL cable handy unfortunately. It will take at least a few days to get here.

We really need a way to verify what the output of “uname -r” is. As the kernel loads it would probably show this somewhere, but it would be scrolling by and difficult to see. I worry that either the kernel build has added a “+” to the “uname -r”, or else that the kernel of the initial ramdisk is somehow involved and missing something. Hopefully the cable will become available soon since this makes life so much easier when asking such questions.

I see, I’ll get working on getting the cable here as soon as I can.

I can record the screen (in slow motion if need be), but is there a particular line I should be looking for? Attached is a photo of the first error, but I see no mention of uname -r.

If “uname” is used during boot, then it won’t actually say it is from that command. What it will be is a kernel version being noted. Typically something “4.9.140-tegra” would show up right as the kernel loads and begins executing. So at the point when the bootloader transitions to the Linux kernel, do you see anything similar to “4.9.140”? This might be complicated by video switching modes and perhaps blanking for a fraction of a second right when valuable information is printed, but when you get the serial UART cable this should not be a problem.

I can’t spot anything of the sort in the output. I will get back on this as soon as I have that cable.

Cable in hand. I hooked up a laptop to the NX, and attempted to boot it up. As far as I can tell it succeeded within the minicom terminal, although the monitor hooked up to the NX is still stuck as it was before.

One thing I immediately tried upon logging in is echo $(uname -r), which returned 4.9.140. Does this mean my LOCALVERSION was not appended during the build process? I have a feeling something is wrong with my build procedure, as I have to manually specify CROSS_COMPILE=$TOOLCHAIN_PREFIX during the make step (my CROSS_COMPILE is properly set to the correct Linaro directory, so I don’t know why it isn’t being used implicitly), otherwise it gives me errors indicative of the incorrect toolchain being used. Perhaps I need to manually specify LOCALVERSION as well, if that is possible?

Yes, this means your Image file of the kernel did not have the correct “CONFIG_LOCALVERSION” during the time it was built. This results in looking for modules in the wrong location.

Your build procedure can be perfect, but if the kernel was told the wrong place to look for modules (and that is basically what not setting CONFIG_LOCALVERSION is when reusing modules), then there is no possibility of success. You must set this manually, or it will be missing.

When people cross compile they use more options to avoid the build system being confused as to whether it is building for local hardware or other hardware. CONFIG_LOCALVERSION is not about the local build environment, but is instead about what the kernel Image file looks at at runtime. Those other options are about the local build environment, and although it might seem like they are related, they are quite independent topics. It would not have mattered if you had built the kernel locally on the Jetson instead of with all of those options in cross compile…there would have been the same failure to find kernel modules.

Always manually specify LOCALVERSION. You can edit the “.config” file:
CONFIG_LOCALVERSION="-tegra"
…or you can edit it from “make menuconfig” or “make nconfig”…or you can set the “LOCALVERSION="-tegra"” prior to the build (some environment variables are inherited by the built if those variables were not specifically set in the .config).

Thank you, editing .config did the trick.

I do want to say that as I mentioned, I did set the LOCALVERSION variable prior to running make tegra_defconfig. Still, the CONFIG_LOCALVERSION in the .config file was set to an empty string. Not sure whether it’s something I did wrong or a bug, but that’s what happened.