Adding of a sched_class makes Image unbootable

Hi guys,
I added a scheduler to the 5.10.104 kernel. The main modifications are as follows:
include/asm-generic/vmlinux.lds.h:

#define SCHED_DATA				
	STRUCT_ALIGN();				
	__begin_sched_classes = .;		
	*(__idle_sched_class)			
	*(__fair_sched_class)			
	*(__rt_sched_class)			
	*(__dl_sched_class)			
	*(__new_sched_class)			
	*(__stop_sched_class)			
	__end_sched_classes = .;

Then I add a new.c in kernel/sched/ and define a new sched_class:

const struct sched_class new_sched_class
	__section("__new_sched_class") = {
        ........
};

But after I recompile the Image and boot it, the UEFI gets stuck at the following location:
EFI stub: Exiting boot services and installing virtual address map…
I found that as long as I put the newly defined struct sched_class variable into the location corresponding to “__tt_sched_class” through “__section” , the UEFI startup will get stuck at the above location.
Do you have any ideas?
Thanks in advance.

Best Regrads
weihua.li@linearx.io

Hi,

Can you please provide the full booting log?
How did you make the new kernel image take effect? Did you specify it in /boot/extlinux/extlinux.conf?
Have you tried the same modification on maybe some x86 machines?

Hi,
I just replace /boot/Image and restart.
The attachement is the boot log.
I’ll try in x86 VM and tell you the result.
Thanks!

bootlog.txt (11.5 KB)

Hi, Dave,
I have tried recompiling the kernel in the x86 VM, but it caused the VM to hang.
Going back to Xavier, in the entire startup process, which link is related to the layout of SCHED_DATA or RO_DATA?
For example, UEFI and ATF, are they related to the layout of SCHED_DATA or RO_DATA?
After modifying the layout of SCHED_DATA or RO_DATA, is it necessary to modify UEFI and/or ATF simultaneously?

Cross compiling (on a Linux PC) does not require a VM (it just uses very easily installed cross tools). If you have enough disk space, then native compile directly on the AGX Xavier is also a good choice (e.g., building on a USB3 external SATA drive or a large thumb drive is an option). Are you interested in either of those options (cross compile on Linux PC or native compile on the Xavier)?

Hi,
I have tried native compiling on the Xavier, the result is the same.

Best Regards.

Did you start by matching the existing kernel config, or else with target tegra_defconfig? Even if you go minimal, you should start with either the existing “/proc/config.gz” (gunzip and rename as .config), or else the tegra_defconfig. Then use something like make target nconfig or menuconfig to remove what you don’t want (and don’t forget to add CONFIG_LOCALVERSION so modules can be found). If those basics are not met, then it is likely boot would fail.

Hi,
Thanks for your reply.
I started form tegra_defconfg and added “CONFIG_LOCALVERSION” when compiling.
Best Regards.

Do you mean maybe they occupy more space in kernel memory now and overwrite UEFI?
I’m not that familiar with kernel source so I cannot make sure, but if it does not work even on x86 machines, then clearly your code is buggy.

Hi Dave,
Thanks.
I’m not sure if it is necessary to recompile UEFI or ATF after modifying the layout of the RO_DATA section of the Linux kernel. I need advice from someone familiar with UEFI or ATF.

I think you may consider re-building UEFI to enable debug print to get more information during booting.

Hi Dave,

I have tried to recompile UEFI for producing more log, but the new “uefi_Jetson.bin” can not boot.
Please check the attached boot log.
Thanks.
TXT.txt (13.0 KB)

Looks like it triggered some exception in ARM registers, which I don’t feel like related to NVIDIA’s design.
However, before probably solving the issue, what’s the purpose of making such changes?

Did you have every needed functions defined here?

Hi Dave,
Thanks for your reply.
The purpose of making such changes is to do a simple experiment of adding a new scheduler class. The background of this matter is the subject of deterministic scheduler in the kernel. As you can see, let alone this subject, simply adding a scheduling class is problematic.

Back to the topic of exceptions triggered by UEFI startup.
I am using jacpack5.1.2, please help to check if there is any problem with the compilation steps:
cd nvidia-uefi
edkrepo checkout r35.4.1
edk2-nvidia/Platform/NVIDIA/Jetson/build.sh

Thanks.

I mean if it does not work even on x86 platforms, then it just does not look like something we should handle.

Okay, this problem does not seem to be a nvidia problem, it is probably just a common problem in the linux kernel of the ARM platform. My current idea is to see more startup information through the debug version of UEFI. Just now, my newly compiled UEFI can be started. The previous inability to boot was actually caused by the case of the file name. Next I will analyze the UEFI information. If there are any nvidia-related problems that cannot be solved, please continue to help.
Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.