Failed to restart display after loading self-compiled Image and dtb

After compiling the driver and DTS of my camera and loading them properly, the display fails to restart

Operating as follows

  1. update Image
    cp Image /boot/

  2. update dtb
    (1)copy
    cp tegra234-p3701-0000-p3737-0000.dtb /boot
    (2)modify /boot/extlinux/extlinux.conf
    add
    "FDT /boot/tegra234-p3701-0000-p3737-0000.dtb
    "

During kernel compile was the option “CONFIG_LOCALVERSION” set to “=-tegra”? If not, then it won’t be able to find the kernel modules. The output of “uname -r” is from the base kernel version, plus the CONFIG_LOCALVERSION. Modules are searched for at:
/lib/modules/$(uname -r)/kernel

If you connect via ssh or serial console or non-GUI console, can you see the output of “uname -r”?

My compilation process is as follows:

(1)Effective cross tool

export CROSS_COMPILE_AARCH64_PATH=/home/wen/workspace/nvidia/orin-agx/gcc

(2)cd Linux_for_Tegra/source

source nvbuild.sh

The kernel is compiled ok !!!

Is this ok?

I have not tried that build script, so I don’t know for sure, but it might set CONFIG_LOCALVERSION. Do you have any access to the command line with that kernel, and if so, what does it say for the output of “uname -r”? Additionally, what do you see from “ls /lib/modules/$(uname -r)/kernel”?

After updating the Image,
“ ls /lib/modules/$(uname -r)/kernel”

The problem here is that the ORin system is R34.0.1

But the kernel I compiled was R34.1.1

Is that a problem?

Looks like there is no issue with CONFIG_LOCALVERSION. I can’t say if using the R34.1.1 source instead of R34.0.1 source is an issue, but this is possible. After the display should have started, but fails, can you provide a copy of the following:

  • Output of “dmesg”. Example to create a log file of this:
    dmesg 2>&1 | tee log_dmesg.txt
  • Log file “/var/log/Xorg.0.log”.

If you have ssh access you should be able to get a copy of that to a different host PC. The logs will likely provide information on whether a module failed to load due to a kernel difference, or for some other reason.

after update r32.1.1 Image

(1) dmesg
log_dmesg.txt (73.0 KB)

(2)lsmod

same issue as Orin Desktop version issue

This is basically just repeating that there is a “version” issue, though I couldn’t say exactly where. A subset of the dmesg log:

[    9.984872] nvgpu: Unknown symbol nvlink_register_link (err -2)
[    9.985145] nvgpu: Unknown symbol nvlink_unregister_device (err -2)
[    9.985379] nvgpu: Unknown symbol nvlink_unregister_link (err -2)
[    9.986353] nvgpu: Unknown symbol nvlink_enumerate (err -2)
[    9.986695] systemd-journald[297]: Received client request to flush runtime journal.
[    9.991819] nvgpu: Unknown symbol nvlink_transition_intranode_conn_off_to_safe (err -2)
[    9.991943] nvgpu: Unknown symbol nvlink_register_device (err -2)
[   10.014015] nvgpu: Unknown symbol nvlink_train_intranode_conn_safe_to_hs (err -2)
[   10.298572] nvgpu: Unknown symbol nvlink_shutdown (err -2)
[   10.298770] nvgpu: Unknown symbol nvlink_register_link (err -2)
[   10.299079] nvgpu: Unknown symbol nvlink_unregister_device (err -2)
[   10.299353] nvgpu: Unknown symbol nvlink_unregister_link (err -2)
[   10.299803] nvgpu: Unknown symbol nvlink_enumerate (err -2)
[   10.299971] nvgpu: Unknown symbol nvlink_transition_intranode_conn_off_to_safe (err -2)
[   10.300329] nvgpu: Unknown symbol nvlink_register_device (err -2)
[   10.300557] nvgpu: Unknown symbol nvlink_train_intranode_conn_safe_to_hs (err -2)

There were lots of symbol errors, not just those above, but those were for the GPU. A “symbol” is a fingerprint to a function to call, typically something like a combination of function name and arguments. When you configure a kernel and build it you are essentially selecting a series of symbols (think of each CONFIG_ item of a configuration as a bookmark into a group of symbols). An “unknown” symbol is one which is missing.

“Missing” symbols can be fulfilled either as being compiled into the kernel Image, or else being provided as a module. Keep in mind that if a module is missing or in some way incompatible with loading in that kernel Image, then this is also a missing symbol set even if you built the module. There is a good chance that something is wrong with your self-compiled kernel’s configuration, or the installation of modules (which could in turn be an issue of either invalid reuse of modules, or else new modules missing, or the modules being present but the kernel not knowing about them and in need of sudo depmod -a).

Do I need to recompile nvidia.ko and nvidia-modeset.ko?

Linux_for_Tegra/source/kernel/kernel-5.10

make ARCH=arm64 modules
make ARCH=arm64 modules_prepare

after succeed,
nvidia.ko and nvidia-modeset.ko are still not found !!!!!

Hi,

Please use the source code from the tarball first. The problem looks like source_sync does not sync those code.

These modules install after running apply_binaries.sh script.
These modules located inside this package:
./kernel/nvidia-l4t-display-kernel_5.10.65-tegra-34.1.1-20220516211757_arm64.deb

I am not sure if you want to check the code or not. If you don’t want to build the code by yourself, then you can use @dimaz 's method too.

./kernel/nvidia-l4t-display-kernel_5.10.65-tegra-34.1.1-20220516211757_arm64.deb

After I follow this installation, the display is still abnormal.
Lsmod looked and nvidia had it, but Nvidia-Modeset still didn’t

图片

Still an error!!!

图片

I don’t think my problem is the same as @Dimaz

  1. After I install R34.1.1 using SDkManager, display is OK

  2. Display is failed after I compiled my own Image and replaced it

JiaZW, what is difference between Nvidia’s Image and yours?

I have transplanted the camera driver into the kernel, so there is a little difference with the official R34.1.1 kernel

Try to compile Nvidia’s sources without your changes…

1 Like