JETSON AGX transplants xenomai

I plan to transplant Xenomai’s dual-core on the AGX platform. The current Nvdia kernel version is 4.9.201 (corresponding to jetpack4.5.1). Since there is no corresponding ipipe patch, I choose a nearby patch: ipipe-xxx-4.9.51 to automatically print Incorporated and manually modifi
ed, solved some compilation problems, and can generate Image normally.
But it can’t start normally. I grabbed the boot serial port log and did not find some fatal errors. I hope to give some guiding suggestions. Thank you very much.
agx_start.log (27.5 KB)

I don’t know about this particular patch, but are you able to first (as a test) create a duplicate of the default kernel, install that, and have it work? Basically this would consist of matching the existing system’s config and CONFIG_LOCALVERSION, then checking if this boots correctly. You would also want to use the compiler which comes with the particular L4T release since some of the newer compilers will have issues.

A good place to start for making an exact match of the config of a running system (assuming you are using an unmodified Jetson) is to copy “/proc/config.gz” to your build area, gunzip that, edit the CONFIG_LOCALVERSION (probably making it “CONFIG_LOCALVERSION=-tegra”), and then attempting build. Should boot work, then make your modifications.

Thank you~ I have done this test before transplanting xenomai, I used the source code of l4t 32.5.1, using the compiler recommended by nvidia gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu to compile the kernel Then use the flash.sh script to flash the machine. In this case, it can boot normally

Sorry, I don’t understand the meaning of this operation very well. What is the function of the file “/proc/config.gz”, and what kind of error is the main purpose of this test?

When a kernel is built it has a configuration. Unless two kernels have the same configuration they cannot be compared. If you add a feature to a kernel (i.e., if you enable a config item), I would expect a near 100% failure unless the rest of the kernel started with the working kernel’s configuration.

During a build, and assuming you have “O=$TEGRA_KERNEL_OUT” for temporary output files, then the file named “.config” at that location is where configuration is read. If you don’t use “O=$TEGRA_KERNEL_OUT”, then the location is the start directory of the kernel source. Without this the kernel build is expected to have a near 100% failure rate.

When running a “make” with target “tegra_defconfig”, then the file “.config” is created with a default “safe” config.

On a running system, the file “/boot/config.gz” is an exact match for the config used at the time of building that running kernel (with one exception). Using the content of “/boot/config.gz” is where you should usually start your configuration. The file is not a real file, but is instead the kernel making its configuration look like a file. You can copy that file to another location, use gunzip on the copy to decompress it, and then rename the file “.config”. You would then place this file at the “O=$TEGRA_DEFCONFIG” location, and other than one edit, this is a guaranteed exact duplicate of the running kernel, and is where you should start before adding new features. tegra_defconfig works too, but it isn’t a guarantee to run exactly like the current kernel.

When I say “almost” an exact match there is one item you must edit before the config.gz is truly a match for the running kernel. That is the “CONFIG_LOCALVERSION=-tegra” feature.

If you run the command “uname -r”, then you will get a file version something like “4.9.201-tegra”. Notice that the tail of this, the “-tegra”, is taken from the CONFIG_LOCALVERSION. If this is not set, then the above sample would just be “4.9.201”. If the value were instead “-test”, then the “uname -r” would instead be “4.9.201-test”.

The reason “uname -r” is important is that this is how the kernel finds modules. The kernel will look for modules at “/lib/modules/$(uname -r)/kernel”. If “uname -r” changes, then you must build 100% of the modules again and place them in the new location. If the base kernel remains constant and all you did was to add a feature, then you don’t need to build or install modules if and only if “CONFIG_LOCALVERSION” is constant (and thus “uname -r” is constant).

The goal is to start with a kernel which is an exact match to the working kernel. Only then would you add a feature. If the matching kernel works, but the modified kernel fails, it is reasonable to conclude that the individual feature is what failed. If the matching kernel fails, then something else is in the way, e.g., the compiler version is wrong or something was picked up in the environment during the build which causes a failure. In no case can two kernels be compared without at least the initial config matching.

What do you see from “uname -r” on the original working kernel? What do you see for “uname -r” on the new kernel? If there is a difference, are all base initial configurations installed? Is the module search location available and containing all configured modules?

Thank you for your answers, I probably understand what you mean, mainly to confirm that the image compiled in the current environment can be normally flashed into agx and started normally
I did a similar test before. After compiling and re-flashing the version, using “uname -r” to get the compilation time and the currently generated image time are consistent, that is, it can be normally flashed and booted before being modified. The modified reason when it fails to start to the end, of course I am also trying the method you said
But can you take a look at the boot serial port log I provided, what is the fatal information that hinders normal startup?
Are there any relevant cases for porting Xenomai to Jetson platform for reference? The main obstacle is that the kernel version of jetson is too old, so related interfaces need to be modified manually

I don’t know enough about what that patch changed, and I do see a lot of i2c errors, but I have no way to estimate what the i2c failures lead to. It is quite possible that various drivers or parts of the kernel were changed such that the device tree is no longer applied correctly, which in turn is one way i2c might fail (i2c is not plug-n-play, thus device tree is how it must be set up…but perhaps the failures I see are unrelated to boot requirements and are just coincidental).

I see a lack of HDMI and related EDID. If no HDMI is connected, then this is not an issue, but if it is, then consider that EDID is read via i2c protocol, and could be an extension of earlier i2c failure messages. Knowing if HDMI is connected (without an adapter) might help in interpreting i2c failures.

I then see an attempt to run the kernel, and it says no SD card. Is there an SD card? If so, then probably device tree failed to point to the SD card, and this too would be an extension of the idea that perhaps device tree is not functioning correctly due to patch changes. I can only guess.

There is then some other kernel load content as the “backup plan” after the SD card fails, and this seems to succeed. This also mentions some success at the initrd getting device tree information. Within the initrd there seems to be success at loading a kernel and passing command line arguments to it. However, there are some issues with not finding various plugin-manager nodes, followed by a kernel panic.

About all I can do is guess that device tree is not working correctly, and possibly due to changes to the kernel, though it could be due to device tree changes or location and not due to how the kernel uses the tree. My “feeling” is that the kernel is not correctly using the tree, but you could double check your device tree and see if you think there might be an error in that, or in the installation of the tree.