[SOLVED] L4T 24.1 kernel build fails with segmentation fault

Hi,

I am facing a peculiar issue while trying to build the kernel source provided with the l4t 24.1 release.

The default tegra21_defconfig works fine. But when I try to rebuild the kernel after enabling CONFIG_MODVERSION , the kernel image build always fails with a segmentation fault at drivers/video/tegra/host/t210/t210.o and t120.o

Enabling this option in previous versions of kernel did not create any issue.

I have tried cross compiling from both 14.04 and 16.04 PCs. Also tried different tool chains with no success. The error message is always the same. Does anybody have any idea how to fix this?

Is the only change addition of CONFIG_MODVERSIONS versus tegra21_defconfig? Which compilers are you using? For the particular build, can you run this with only a single CPU core and show the log of the failure (meaning “-j” option is not used or is “-j1”, e.g., just “make modules”)? Also, in what order are your build commands, e.g., were you building different parts of the kernel in a particular order?

Hi linuxdev,
To answer your questions.

  1. Yes CONFIG_MODVERSIONS is the only change versus tegra21_defconfig.

  2. These are the toolchains I have tried with:

On the TX1 board with an arm64 bit rootfs

gcc (Ubuntu/Linaro 4.8.2-19ubuntu1) 4.8.2
arm-linux-gnueabihf-gcc (Ubuntu/Linaro 4.8.4-2ubuntu1~14.04.1) 4.8.4

On a 14.04 64 bit PC:

CROSS_COMPILE = aarch64-linux-gnu-gcc (Ubuntu/Linaro 4.8.2-13ubuntu1) 4.8.2 20140110 (prerelease) [ibm/gcc-4_8-branch merged from gcc-4_8-branch, revision 205847]
CROSS32CC = arm-linux-gnueabihf-gcc (Ubuntu/Linaro 4.8.4-2ubuntu1~14.04.1) 4.8.4

CROSS_COMPILE = aarch64-linux-gnu-gcc (Linaro GCC 5.3-2016.02) 5.3.1 20160113
CROSS32CC = arm-linux-gnueabihf-gcc (Linaro GCC 5.3-2016.02) 5.3.1 20160113

The toolchains built from source code using jetson-tx1-toolchain-build.tbz2

On a 16.04 64 bit PC:

CROSS_COMPILE = aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609
CROSS32CC = arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609

  1. Log messages from the TX1 board: (Its the same error on all development PCs as well.)
ubuntu@tegra-ubuntu:~/Downloads/kernel$ make O=$TEGRA_KERNEL_OUT Image -j1
  Using /home/ubuntu/Downloads/kernel as source for kernel
  GEN     /home/ubuntu/Downloads/kern_out/Makefile
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
make[2]: `include/generated/mach-types.h' is up to date.
  CALL    /home/ubuntu/Downloads/kernel/scripts/checksyscalls.sh
  CC      scripts/mod/devicetable-offsets.s
  GEN     scripts/mod/devicetable-offsets.h
  HOSTCC  scripts/mod/file2alias.o
  HOSTLD  scripts/mod/modpost
  CHK     include/generated/compile.h
  CHK     kernel/config_data.h
  CC      drivers/video/tegra/host/t124/t124.o
Segmentation fault
make[6]: *** [drivers/video/tegra/host/t124/t124.o] Error 139
make[5]: *** [drivers/video/tegra/host/t124] Error 2
make[4]: *** [drivers/video/tegra/host] Error 2
make[3]: *** [drivers/video/tegra] Error 2
make[2]: *** [drivers/video] Error 2
make[1]: *** [drivers] Error 2
make: *** [sub-make] Error 2
  1. The order of the build commands are as follows (after exporting the correct environment variables).
    mkdir $TEGRA_KERNEL_OUT
    make O=$TEGRA_KERNEL_OUT tegra21_defconfig
    make O=$TEGRA_KERNEL_OUT menuconfig (to enable modversions. no other change)
    make O=$TEGRA_KERNEL_OUT Image

Also another thing is, the kernel build runs successfully on all development systems with the default defconfig. ie. with the MODVERSIONS disabled.

I’m unsure if the native JTX1 build would work due to foreign architecture being something difficult to get installed correctly. The other v4.8 cross compiles should work (even if the native compile would fail I would not expect it to fail SIGSEGV). There have been times in the past where compiles actually detected bad RAM, but this wouldn’t be the case over multiple machines.

There were some source code edits in 3.10.96 which were needed…if these are not in place they need to be added, but probably would not result in the SIGSEGV…but just to be thorough here is a list of those edits:

# In the top level Makefile KBUILD_CFLAGS_KERNEL needs to have added "-fomit-frame-pointer":
KBUILD_CFLAGS_KERNEL := -fomit-frame-pointer
# File "drivers/platform/tegra/tegra21_clocks.c" line 1065 possibly needs extra parenthesis:
c->state = ((!is_lp_cluster()) == (c->u.cpu.mode == MODE_G)) ? ON : OFF;
# File "drivers/base/Kconfig" may be missing a trailing quote on line 234. If the trailing quote is missing it must be added back in for config to correctly work.

In particular, that last note on Kconfig could cause different failures depending on config. Are those edits all in place, especially the Kconfig edit for missing trailing quote mark? If not, then this might explain the odd behavior across compile platforms and compilers for just specific configurations.

I have done all the above mentioned fixes already. I am fairly certain that they are not the reason for this error. And I have built successfully on the TX1 natively. So i’m sure there is no error on that side too. Only when I enable MODVERSIONS, I get this segmentation fault. Maybe this is related to the kernel source code. Im not sure.

These are some messages I noted while building, applying defconfig, menuconfig etc.They appear even with MODVERSIONS disabled. Could this have any relation?

warning: (PM_SLEEP_SMP) selects HOTPLUG_CPU which has unmet direct dependencies (SMP && HOTPLUG)
warning: (ARCH_TEGRA_21x_SOC && ARCH_TEGRA_12x_SOC) selects SKIP_LATE_PASR_SETUP which has unmet direct dependencies (STAGING && PASR)
warning: (PM_SLEEP_SMP) selects HOTPLUG_CPU which has unmet direct dependencies (SMP && HOTPLUG)
warning: (ARCH_TEGRA_21x_SOC && ARCH_TEGRA_12x_SOC) selects SKIP_LATE_PASR_SETUP which has unmet direct dependencies (STAGING && PASR)

I would disable TEGRA_GRHOST and rebuild, but then again, it is vital for the TX1 kernel and its subsystems. I guess it enables the main GPU. It Would be helpful if someone from NVIDIA could look into this issue. Any insight would be useful to understand this.

I can reproduce and confirm this error. For those who may wish to reproduce this:
Using the crosstool-ng-4.8.2 provided within the R24.1 documentation “baggage” directory. Config is tegra21_defconfig, followed by setting CONFIG_MODVERSIONS yes. “make zImage”.

Looks like “drivers/video/tegra/host/t124/t124.o” fails. Log:

CC      drivers/video/tegra/host/t124/t124.o
/bin/sh: line 1:  8713 Done(2)                 /usr/local/aarch64-unknown-linux-gnu/crosstool-ng-4.8.2/bin/aarch64-unknown-linux-gnu-gcc -E -D__GENKSYMS__ -Wp,-MD,drivers/video/tegra/host/t124/.t124.o.d -nostdinc -isystem /usr/local/crosstool-ng/4.8.2/toolchain-build-aarch64/install/lib/gcc/aarch64-unknown-linux-gnu/4.8.2/include -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/arch/arm64/include -Iarch/arm64/include/generated -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/include -Iinclude -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/arch/arm64/include/uapi -Iarch/arm64/include/generated/uapi -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/include/uapi -Iinclude/generated/uapi -include /home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/include/linux/kconfig.h -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/video/tegra/host/t124 -Idrivers/video/tegra/host/t124 -D__KERNEL__ -mlittle-endian -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/arch/arm/mach-tegra/include -Iarch/arm/mach-tegra/include -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -std=gnu89 -O1 -fconserve-stack -mgeneral-regs-only -fno-pic -fno-reorder-blocks -fno-ipa-cp-clone -fno-partial-inlining -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -g -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -Werror -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/platform/tegra/include -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/arch/arm/mach-tegra/include -Iarch/arm/mach-tegra/include -Werror -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/video/tegra/host -Idrivers/video/tegra/host -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/media/platform/tegra/vi -Idrivers/media/platform/tegra/vi -I/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/video/tegra/camera -Idrivers/video/tegra/camera -Wno-multichar -Werror -fomit-frame-pointer -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(t124)" -D"KBUILD_MODNAME=KBUILD_STR(nvhost_t124)" /home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/video/tegra/host/t124/t124.c
      8714 Segmentation fault      (core dumped) | scripts/genksyms/genksyms -r /dev/null > drivers/video/tegra/host/t124/.tmp_t124.ver
/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/scripts/Makefile.build:308: recipe for target 'drivers/video/tegra/host/t124/t124.o' failed
make[6]: *** [drivers/video/tegra/host/t124/t124.o] Error 139
/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/scripts/Makefile.build:455: recipe for target 'drivers/video/tegra/host/t124' failed
make[5]: *** [drivers/video/tegra/host/t124] Error 2
/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/scripts/Makefile.build:455: recipe for target 'drivers/video/tegra/host' failed
make[4]: *** [drivers/video/tegra/host] Error 2
/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/scripts/Makefile.build:455: recipe for target 'drivers/video/tegra' failed
make[3]: *** [drivers/video/tegra] Error 2
/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/scripts/Makefile.build:455: recipe for target 'drivers/video' failed
make[2]: *** [drivers/video] Error 2
/home/build/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/Makefile:808: recipe for target 'drivers' failed
make[1]: *** [drivers] Error 2
Makefile:130: recipe for target 'sub-make' failed
make: *** [sub-make] Error 2

Even if the code is invalid the compiler should not segfault. I don’t know if this is the result of bad code the compiler is not correctly handling, or if this is correct code in a corner case which the compiler can’t handle. However, if I also compile under Linaro 5.2 this error still comes up as you had mentioned earlier. An uncommon corner case could have lived in the compiler for a long time, but going from a 4.8 series compiler to a 5.2 series is fairly far removed. Perhaps something related to module versioning has not correctly been ported to arm64 yet…unless the feature was both used and reported, nobody would have known it needs fixing.

Asking for workarounds instead of fixes is less than ideal, but is CONFIG_MODVERSIONS something you require?

Thanks for confirming it linuxdev. Should be able to get the attention of more developers now.

That seems highly unlikely. Because in the L4T_R23.2 release, I have enabled CONFIG_MODVERSIONS and built the kernel successfully. The kernel build is the same in both these releases. ie. 64 bit kernel with support for 32 bit userspace.

Maybe someone from NVIDIA can comment about this?

It seems to be that because the output of this compile stage is being piped to “script/genksyms/genksyms” (and genksyms is a generated file) that the seg fault is from genksyms and not from the compiler:

/usr/local/aarch64-unknown-linux-gnu/crosstool-ng-4.8.2/bin/aarch64-unknown-linux-gnu-gcc -E -D__GENKSYMS__ -Wp,-MD,drivers/video/tegra/host/t124/.t124.o.d -nostdinc -isystem /usr/local/crosstool-ng/4.8.2/toolchain-build-aarch64/install/lib/gcc/aarch64-unknown-linux-gnu/4.8.2/include -I/home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/arch/arm64/include -Iarch/arm64/include/generated -I/home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/include -Iinclude -I/home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/arch/arm64/include/uapi -Iarch/arm64/include/generated/uapi -I/home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/include/uapi -Iinclude/generated/uapi -include /home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/include/linux/kconfig.h -I/home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/video/tegra/host/t124 -Idrivers/video/tegra/host/t124 -D__KERNEL__ -mlittle-endian -I/home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/arch/arm/mach-tegra/include -Iarch/arm/mach-tegra/include -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -std=gnu89 -O1 -fconserve-stack -mgeneral-regs-only -fno-pic -fno-reorder-blocks -fno-ipa-cp-clone -fno-partial-inlining -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -g -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -I/home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/video/tegra/host -Idrivers/video/tegra/host -I/home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/media/platform/tegra/vi -Idrivers/media/platform/tegra/vi -I/home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/video/tegra/camera -Idrivers/video/tegra/camera -Wno-multichar -Werror -fomit-frame-pointer -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(t124)" -D"KBUILD_MODNAME=KBUILD_STR(nvhost_t124)" /home/dan/Documents/embedded/L4T/R24.1/src/kernel/3.10.96-fix/drivers/video/tegra/host/t124/t124.c

Note that in particular there is this pipe:

9506 Segmentation fault      (core dumped) | scripts/genksyms/genksyms -r /dev/null > drivers/video/tegra/host/t124/.tmp_t124.ver

Either the data in the pipe is wrong and genksyms can’t handle it, or genksyms has a bug. The compiler itself is not the issue (which is why this happens with different compiler versions…the genksyms pipe is failing). I looks like t124.o was never generated, so perhaps genksyms failed with no data at all going in (you can’t pipe from a missing file)…in which case the bug is in t124.c build. What I wonder about is that since this is a t210 (but probably inherits some code and architecture from t124) is if it was a mistake for the t124 to be compiled at all…if so, then cutting this from the configuration should solve the problem. If t124 does need to be compiled, then the reason it fails needs to be solved.

Can anyone confirm if a JTX1 compile should be building “drivers/video/tegra/host/t124/t124.c”?

Additional note: I can verify t124.c is compiled if CONFIG_MODVERSIONS is not set in an otherwise identical configuration. Whether or not it was meant to build on JTX1, and why build fails with CONFIG_MODVERSIONS in a case of “it should build”, I don’t know.

You may find this useful: https://gist.github.com/chutsu/9bb6abe6f61924c88521adec859c7006

I just wanted to bump this thread since it is still unclear as to why t124.c fails under CONFIG_MODVERSIONS.

It’s a bug in genksyms: [url]https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/scripts/genksyms?id=1c722503fa81888c936a8d1a5052daec859f1a7c[/url]

The duplicate function pointer type triggering the bug appears to be the definition of ‘callback’ in drivers/video/tegra/host/isp/isp.h. Renaming that type to something else (and making the corresponding changes in isp.c) appears to help.

Thanks for pointing that out masdisox. I renamed the ‘callback’ to ‘callback_hdl’ in isp.h and isp.c and the build goes smoothly now without any error. Thank you linuxdev for your persistent help. Now I am able to build the kernel and all modules with CONFIG_MODVERSIONS enabled.

This patch solves the issue. Tested as working on a Jetson TX1.

Index: drivers/video/tegra/host/isp/isp.c
===================================================================
--- drivers/video/tegra/host/isp/isp.c	
+++ drivers/video/tegra/host/isp/isp.c	
@@ -216,7 +216,7 @@
 	writel(data, tegra_isp->base + offset);
 }
 
-int tegra_isp_register_mfi_cb(callback cb, void *cb_arg)
+int tegra_isp_register_mfi_cb(callback_hdl cb, void *cb_arg)
 {
 	if (mfi_callback || mfi_callback_arg) {
 		pr_err("cb already registered\n");
Index: drivers/video/tegra/host/isp/isp.h
===================================================================
--- drivers/video/tegra/host/isp/isp.h	
+++ drivers/video/tegra/host/isp/isp.h	
@@ -21,7 +21,7 @@
 
 #include "camera_priv_defs.h"
 
-typedef void (*callback)(void *);
+typedef void (*callback_hdl)(void *);
 
 struct tegra_isp_mfi {
 	struct work_struct work;
@@ -48,10 +48,10 @@
 void nvhost_isp_queue_isr_work(struct isp *tegra_isp);
 
 #ifdef CONFIG_TEGRA_GRHOST_ISP
-int tegra_isp_register_mfi_cb(callback cb, void *cb_arg);
+int tegra_isp_register_mfi_cb(callback_hdl cb, void *cb_arg);
 int tegra_isp_unregister_mfi_cb(void);
 #else
-static inline int tegra_isp_register_mfi_cb(callback cb, void *cb_arg)
+static inline int tegra_isp_register_mfi_cb(callback_hdl cb, void *cb_arg)
 {
 	return -ENOSYS;
 }

Just FYI, I tested the patch from the above URL at kernel.org, restated here again:
[url]https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/scripts/genksyms?id=1c722503fa81888c936a8d1a5052daec859f1a7c[/url]

With the kernel.org fix I still get the genksyms segfault after adding CONFIG_MODVERSIONS, so the isp.h patch above (thanks to @dilipkumar25) is probably the way to go (it may be that a different config will show this genksyms flaw, this fixes only the one place to satisfy genksyms rather than fixing genksyms to not have the issue). Don’t forget to edit the isp.c version of tegra_isp_register_mfi_cb to match the change in isp.h.

If you use the kernel.org fix, you have to the rebuild with REGENERATE_PARSERS=1 (you’ll need bison, flex, and gperf installed for this) and then check the *_shipped copies of the files in scripts/genksyms. Took me a little while to track that down, it’s pretty obscure.

Tried that as well. But still got the segfault. Thats why I went ahead with patch.

Rebuilding with REGENERATE_PARSERES=1 might help as madisox pointed out. I haven’t tried that. Also, is that a one time process or you have to do it every time you rebuild the kernel?

If you commit the generated files in scripts/genksyms (keywords.hash.c_shipped, lex.lex.c_shipped, parse.tab.c_shipped, parse.tab.h_shipped) after they are regenerated, you won’t have to regenerate them again.