Can't boot vendor production Jetson with custom compiled Kernel

There is a lot which might go wrong due to configuration. Normally one might start with the make target tegra_defconfig, and then make edits (which must include CONFIG_LOCALVERSION being set up correctly for the situation). Knowing what you did for configuration, and knowing if you changed a feature integrated into the kernel Image file (an “=y” configuration, versus a module feature with “=m”) matters. Knowing what the CONFIG_LOCALVERSION was set to matters.

Along those lines, with the original kernel in place, what do you see from “uname -r”? Also, with that original kernel booted, you should create a copy if the “/proc/config.gz” file since this is an exact copy (other than CONFIG_LOCALVERSION) of the configuration of the running kernel.

Note that if you run gunzip on a copy of the “/proc/config.gz”, and rename it to “.config”, then this (when placed at the right location) could be used as the configuration after editing CONFIG_LOCALVERSION (then, and only then, would you make any other edits using a configuration editor, e.g., menuconfig or nconfig).

Incidentally, we don’t know if the manufacturer changed just the configuration, or if the manufacturer changed the source code. Probably it is ok to use the NVIDIA source code, but if you can get information on this (basically information on if only configuration is custom, versus if source code itself is edited), it might help.

Regarding device trees, these are technically not part of the kernel, and were in fact designed to keep the kernel more generic and to avoid the billions of drivers that would be needed if each hardware chipset needed its own driver for every possible setup. The device tree could be considered as data passed to the driver as an argument (or environment) as the driver loads. This data provides things like the address of devices, what drivers are compatible, and anything possibly useful to the driver about non-plug-n-play devices. This means that each device tree fragment goes with a particular driver, and if that driver does not specifically care about that node, then it is ignored.

If you add a new driver, and if the hardware related to it is not plug-n-play, then you likely need a device tree fragment to set up that hardware for that driver. If you edit the source code of a particular driver, and if that new edit requires changing the device tree, then you’d edit that fragment, but there are probably a lot of cases where the device tree would not change; an example being that if the edit fixes a bug, then likely the use of the arguments being passed won’t change, but if the edit adds a feature which is variable, then perhaps the device tree fragment would say to use or not use that feature.

If you have the system booted with the original kernel, then you can use dtc (you might need to “sudo apt-get install device-tree-compiler”) to extract the running device tree and turn it into source code:
dtc -I fs -O dts -o extracted.dts /proc/device-tree

A full serial console boot log can be better since it will tell you which tree is loaded from where, but on rare occasions, boot stages might edit that content before passing it to the Linux kernel…thus most of the time extracted and loaded trees are the same (maybe after an overlay), but it is possible that boot stage edits can slightly alter a device tree before the kernel sees it (and “/proc/device-tree” is what the kernel sees).

If the original tree is still being used after a change of the Image file, then it is a question of whether the drivers were edited such that the original file is no longer valid. Perhaps a better question though is whether that content from the manufacturer is required for boot.

Also, kernel modules are searched for at “/lib/modules/$(uname -r)/kernel”. The output of “uname -r” depends in the base kernel version, plus the setting of “CONFIG_LOCALVERSION” at the time of kernel compile, so if you only change modules, this is irrelevant to finding modules, although it could be relevant to loading modules compiled for a different CONFIG_LOCALVERSION; conversely, if Image is changed, and if CONFIG_LOCALVERSION also changes (which you’d want for a change to an “=y” feature), then all modules must be reinstalled compiled against the new kernel config and placed in a new location. Thus the question earlier on the output of the command “uname -r” (we need to know if the base kernel version and/or CONFIG_LOCALVERSION is the same or different). We also need to know if you changed just the Image, or a module, or both. We also need to know how you installed the kernel since it is possible for this to be in a partition, or in “/boot”, or both (if both, then the one named in extlinux.conf takes precedence; the exception is that if security fuses are burned, then only the signed partition version is allowed; same with the device tree which can be located at both “/boot” or a signed partition).

You might find these of interest (most are about device tree; the last URL is about kernel config):