Install a kernel on Jetson AGX

Hello, I was recently tasked with installing a linux kernel to the Nvidia Jetson AGX . I followed the guide here: Running a mainline linux kernel on the NVIDIA Jetson Xavier AGX .

I flash the device use sdkmanager, compile kernel in my host machine(ubuntu), Then I copied the full Linux fold to AGX and run install command, just like the guide:

sudo make make modules_install
sudo make install

here is ‘make install’ output

make_install.txt (1.4 KB)

then I edit the extlinux.conf, just replace this two line to:

LINUX /boot/vmlinuz-5.15.0-rc2+
INITRD /boot/initrd.img-5.15.0-rc2+

and add a line(which not make any sense I think):

FDT /boot/tegra194-p2888-0001-p2822-0000.dtb

and I reboot the AGX, boot failed and got bootlog:

bootlog_install_kernel.txt (38.6 KB)

I want to know what happened? What should I do to compile a kernel and boot it on AGX? And is there any way can let me to edit the extlinux.conf to reboot the machine instead of flashing it with sdkmanager again. thanks!

You would normally use the vmlinuz file for a desktop PC, but want to use the Image file for kernel on a Jetson. There is a lot which can go wrong with a mainline kernel, so I wouldn’t be surprised if this is a long project, but if you want the kernel (and you might need to reflash if an attempt fails to boot and you don’t manually install and name an alternate Image in “/boot/extlinux/extlinux.conf”):

# Set a new config for "`CONFIG_LOCALVERSION`", e.g.,
# export CONFIG_LOCALVERSION='-mainline'
make Image
# I am going to pretend the base is release is "5.4.0". Adjust for your case.
# Install is to find "Image" in your compiled kernel, and copy it to:
/boot/Image-5.4.0-mainline
# Now add a new entry in "extlinux.conf" without deleting the old entry...
# you'll be able to recover without flashing if you only select this entry
# via serial console and have the old kernel still in place. A contrived
# extlinux.conf additional entry might be something like this:
LABEL mainline
      MENU LABEL mainline 5.4.0
      LINUX /boot/Image-5.4.0-mainline
      APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4

The above, if it boots, will result in a “uname -r” of “5.4.0-mainline”. Kernel modules would be searched for at “/lib/modules/$(uname -r)/kernel”. Thus your kernel modules would need to occupy subdirectories of “/lib/modules/5.4.0-mainline/kernel”.

A typical kernel compile I recommend is to not use the commands to directly install anything, but to instead put your work in a temporary area. All of those install commands will pretty much do the wrong thing since they expect a desktop PC, but this isn’t a PC. Keeping in mind the above edits, here is a more generic concept of compiling, and is an example for compiling natively on the Jetson (big warning: start with enough free disk space, or mount something like a thumb drive on the temporary output point):

# --- Setting Up: -------------------------------------------------------
# DO NOT BUILD AS ROOT/SUDO!!! You might need to install source code as root/sudo.
mkdir -p "${HOME}/build/kernel"
mkdir -p "${HOME}/build/modules"
mkdir -p "${HOME}/build/firmware"

export TOP="/usr/src/sources/kernel/kernel-4.9"
export TEGRA_KERNEL_OUT="${HOME}/build/kernel"
export TEGRA_MODULES_OUT="${HOME}/build/modules"# --- Setting Up: -------------------------------------------------------
# DO NOT BUILD AS ROOT/SUDO!!! You might need to install source code as root/sudo.
mkdir -p "${HOME}/build/kernel"
mkdir -p "${HOME}/build/modules"
mkdir -p "${HOME}/build/firmware"

export TOP="/usr/src/sources/kernel/kernel-4.9"
export TEGRA_KERNEL_OUT="${HOME}/build/kernel"
export TEGRA_MODULES_OUT="${HOME}/build/modules"
export TEGRA_FIRMWARE_OUT="${HOME}/build/firmware"
export TEGRA_BUILD="${HOME}/build"

# --- Notes: ------------------------------------------------------------
# It is assumed kernel source is at "/usr/src/sources/kernel/kernel-4.9".
# Check if you have 6 CPU cores, e.g., via "htop".
# If you are missing cores, then experiment with "sudo nvpmodel -m 0, -m 1, and -m 2".
# Perhaps use "htop" to see core counts.
# Using "-j 6" in hints below because of assumption of 6 cores.
# -----------------------------------------------------------------------

# Compile commands start in $TOP, thus:
cd $TOP

# Do not forget to provide a starting configuration. Probably copy of "/proc/config.gz",
# to $TEGRA_KERNEL_OUT, but also perhaps via:
make O=$TEGRA_KERNEL_OUT nconfig

# If building the kernel Image:
make -j 6 O=$TEGRA_KERNEL_OUT Image

# If you did not build Image, but are building modules:
make -j 6 O=$TEGRA_KERNEL_OUT modules_prepare

# To build modules:
make -j 6 O=$TEGRA_KERNEL_OUT modules

# To build device tree content:
make -j 6 O=$TEGRA_KERNEL_OUT dtbs

# To put modules in "$TEGRA_MODULES_OUT":
make -j 6 O=$TEGRA_KERNEL_OUT INSTALL_MOD_PATH=$TEGRA_MODULES_OUT

# To put firmware and device trees in "$TEGRA_FIRMWARE_OUT":
make -j 6 O=$TEGRA_KERNEL_OUT INSTALL_FW_PATH=$TEGRA_FIRMWARE_OUT

export TEGRA_FIRMWARE_OUT="${HOME}/build/firmware"
export TEGRA_BUILD="${HOME}/build"

# --- Notes: ------------------------------------------------------------
# It is assumed kernel source is at "/usr/src/sources/kernel/kernel-4.9".
# Check if you have 6 CPU cores, e.g., via "htop".
# If you are missing cores, then experiment with "sudo nvpmodel -m 0, -m 1, and -m 2".
# Perhaps use "htop" to see core counts.
# Using "-j 6" in hints below because of assumption of 6 cores.
# -----------------------------------------------------------------------

# Compile commands start in $TOP, thus:
cd $TOP

# Do not forget to provide a starting configuration. Probably copy of "/proc/config.gz",
# to $TEGRA_KERNEL_OUT, but also perhaps via:
make O=$TEGRA_KERNEL_OUT nconfig

# If building the kernel Image:
make -j 6 O=$TEGRA_KERNEL_OUT Image

# If you did not build Image, but are building modules:
make -j 6 O=$TEGRA_KERNEL_OUT modules_prepare

# To build modules:
make -j 6 O=$TEGRA_KERNEL_OUT modules

# To build device tree content:
make -j 6 O=$TEGRA_KERNEL_OUT dtbs# --- Setting Up: -------------------------------------------------------
# DO NOT BUILD AS ROOT/SUDO!!! You might need to install source code as root/sudo.
mkdir -p "${HOME}/build/kernel"
mkdir -p "${HOME}/build/modules"
mkdir -p "${HOME}/build/firmware"

export TOP="/usr/src/sources/kernel/kernel-4.9"
export TEGRA_KERNEL_OUT="${HOME}/build/kernel"
export TEGRA_MODULES_OUT="${HOME}/build/modules"
export TEGRA_FIRMWARE_OUT="${HOME}/build/firmware"
export TEGRA_BUILD="${HOME}/build"

# --- Notes: ------------------------------------------------------------
# It is assumed kernel source is at "/usr/src/sources/kernel/kernel-4.9".
# Check if you have 6 CPU cores, e.g., via "htop".
# If you are missing cores, then experiment with "sudo nvpmodel -m 0, -m 1, and -m 2".
# Perhaps use "htop" to see core counts.
# Using "-j 6" in hints below because of assumption of 6 cores.
# -----------------------------------------------------------------------

# Compile commands start in $TOP, thus:
cd $TOP

# Do not forget to provide a starting configuration. Probably copy of "/proc/config.gz",
# to $TEGRA_KERNEL_OUT, but also perhaps via:
make O=$TEGRA_KERNEL_OUT nconfig

# If building the kernel Image:
make -j 6 O=$TEGRA_KERNEL_OUT Image

# If you did not build Image, but are building modules:
make -j 6 O=$TEGRA_KERNEL_OUT modules_prepare

# To build modules:
make -j 6 O=$TEGRA_KERNEL_OUT modules

# To build device tree content:
make -j 6 O=$TEGRA_KERNEL_OUT dtbs

# To put modules in "$TEGRA_MODULES_OUT":
make -j 6 O=$TEGRA_KERNEL_OUT INSTALL_MOD_PATH=$TEGRA_MODULES_OUT

# To put firmware and device trees in "$TEGRA_FIRMWARE_OUT":
make -j 6 O=$TEGRA_KERNEL_OUT INSTALL_FW_PATH=$TEGRA_FIRMWARE_OUT


# To put modules in "$TEGRA_MODULES_OUT":
make -j 6 O=$TEGRA_KERNEL_OUT INSTALL_MOD_PATH=$TEGRA_MODULES_OUT

# To put firmware and device trees in "$TEGRA_FIRMWARE_OUT":
make -j 6 O=$TEGRA_KERNEL_OUT INSTALL_FW_PATH=$TEGRA_FIRMWARE_OUT

The “-j 6” just says to use 6 CPU cores. Systems with fewer would use fewer cores. All of that generic procedure is for having kernel source separate from output and temporary files, and to all you to manually install content instead of using tools which don’t understand the Jetson’s boot sequence. Feel free to ask questions if you decide to experiment with this, e.g., what to copy where from the temporary output locations.

Hello linuxdev, Thanks for you reply.
I have successfully? booted 5.15-rc2+ kernel on agx, but I can’t enter desktop,there is a loop output on screen:


[ok] created slice user slice of gdm
[ok] Started Session c** of user gdm
Starting User Manager for UID 120.
[ok] Started Session c** of user gdm
Started User Manager for UID 120.
Stopping User Manager for UID 120.
[ok] removed slice user slice of gdm

but, if I use j501 debug console to enter the system, systemctl status gdm is running

Then I replaced the dtb file(the origin one is come from 5.15kernel, tegra194-p2972-0000.dts, this one I forget where I download it), this time I can enter the desktop system, but the usb interface is all invalid

Unfortunately, I am not familiar with device tree files, and can only see that the second one lacks the relevant definition of usb port, but don’t know how to modify it.

Do you have any idea about why the kernel dtb file can’t start desktop environment? Thanks!

the kernel dtb file:
tegra194-p2972-0000.dtb (55.2 KB)

the dmesg:
dmesg_usb (35.5 KB)

the dtb file(enter ubuntu login page but usb not work):
tegra194-p2972-0000-hdmi.dtb (34.3 KB)

the dmesg:
dmesg_hdmi (32.6 KB)

The GPU driver is in binary format and is loaded into the X server as a module. Unless you are using the older X server for which the GPU driver was intended, then direct hardware access drive cannot succeed. Similarly, unless the new kernel is set up to work with that older X server, then the support for the X server will also fail. How much have you updated? Does it include changing the X server version?

I don’t know what changes would be needed for a 5.x kernel to support the older X server, but it should work if those changes are in place along with the correct older binary API X server.

NOTE: You could include “/var/log/Xorg.0.log”. My guess is there’d be a message about either not being able to load the NVIDIA GPU driver, or else something in the environment causing the X server to directly fail to load. Indirectly, if device tree content has to change to deal with a newer kernel, then this would also cause fail, but the path to fixing it would be a mere device tree edit.

Hi, linuxdev, thanks for reply
I checked the “ /var/log/Xorg.0.log ”, as you say, loading nvidia driver failed, and I noticed that in journalctl -b it said “Failed to load module nvgpu” and “Failed to start nvpmodule service”, does it matter?
this is xServer version, and log message, any further advice? thanks!

xserver_log (7.7 KB)
xserver_ver (2.6 KB)

If the NVIDIA GPU driver fails to load, then you could only work with software rendering (and no CUDA). In the list of packages the Nouveau driver is such a “software only” driver, but is incompatible with having this on the same system which is loading the NVIDIA driver. There are some Nouveau packages in the wild which are not display drivers, but are instead utilities, and those are ok. However, if you look closely at your package list and search for “Nouveau”, then you’ll see a video driver, which shouldn’t be there.

On the other hand, I think your server probably has too new of an ABI to load the older NVIDIA GPU driver. If you can live without CUDA and without hardware accelerated rendering, then you can probably get the Nouveau driver to work with some effort.

I don’t know which version of ABI the NVIDIA driver loads into for your case, but basically when the kernel and X server support the non-mainline-kernel X server code, then it will work. I have not checked version numbers for your X server, nor what may have changed with this mainline kernel release to change compatibility, but that is the starting point to find out what is needed.

Note that the ABI is the “application BINARY interface”. This is basically similar to saying that if you were to compile a loadable library with one compiler using some standard for the dynamically loadable code interface, then the code which actually loads that loadable library would need to follow that same standard. If you look in your Xorg log file this is the part which tells you about the server’s ABI versions:

[    16.237] (II) Module ABI versions:
[    16.237]    X.Org ANSI C Emulation: 0.4
[    16.237]    X.Org Video Driver: 23.0
[    16.237]    X.Org XInput driver : 24.1
[    16.237]    X.Org Server Extension : 10.0

Note that several modules load, and if one of them is wrong, then only that subset of code will fail. For example, if the XInput ABI mismatches, then keyboard and mouse will fail. In this case you are only concerned with the video driver. I suggest you check an unmodified system which uses the default Jetson kernel and which correctly works in the GUI, copy down the working Video ABI version, and then come back to the one running on the mainline kernel to see if they match.

If the ABIs do not match, then it means the server will probably need to be reverted since you cannot recompile the binary-only GPU driver (instead you’d be required to use the matching X server). If the ABIs do match, then I would say that there is probably a system call from X server to kernel causing a failure. In this latter case a fix might be as simple as enabling a certain kernel feature, but it also be that the mainline kernel is expecting a newer generation of X server (I don’t know, thus the way to examine versions is mentioned).

It is probably worth noting that NVIDIA has said their roadmap will likely include a 5.x series kernel and Ubuntu 20.04 with their first quarter 2022 release. This is many months away as a minimum, but if you really must have a 5.x kernel and are not able to use a compatible Xorg X11 server, then it is possible you’ll need to wait that long.