Failed to start X server

Hi All,
I find that there is a high chance that gdm3 will fail to start Xserver at the first time when every time system startup. The strange thing is that if I use the official release kernel image, there will be no problem. I am sure I didn’t modify any file in kernel source code.
I check Xorg.0.log and find the error is happend in Nvidia’s driver. Because I don’t have the driver’s source code, could you please take a look at this issue? The NANO module I use is Jetson Nano P3448-0020.
Xorg.0.log (7.57 KB)
dmesg.log (57.9 KB)

Hi FrankPCP,

Could you share your “uname -r” and “lsmod”?

Hi WayneWWW,
Here you are.
Thanks.

pcp@pcp-desktop:~$ uname -r
4.9.140+
pcp@pcp-desktop:~$ lsmod
Module                  Size  Used by
bnep                   18822  2
fuse                  111883  5
zram                   29313  4
nvgpu                1717727  33
bluedroid_pm           16059  0
ip_tables              21475  0
x_tables               38016  1 ip_tables
pcp@pcp-desktop:~$

Hi FrankPCP,

I will help check. Could you share your setup? Is it devkit or your custom board?
The only thing you change is rebuild the kernel driver and modules, right?

Hi WayneWWW,

  1. It is our custom board. Jetson Nano P3448-0020 + customized carrier board.
  2. Yes, I only rebuild the kernel image and modules. The toolchain is gcc version 7.3.1 20180425 linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701.
  3. My host pc is Ubuntu 16.04.6 LTS.
  4. The issue is gdm3 start X server failed and then it restart X server agagin. So you will see the login page, even you set Ubuntu is autologin mode.

Thanks.

Hi WayneWWW,
Any updates?

Thanks

Could you share your steps for building kernel image/modules?

Also, could you paste your kernel image and modules here as a tarball?

Hi WayneWWW,
We found how to resolve this issue.

Step 1. export LOCALVERSION=-tegra before building kernel.
Step 2. Rebuild kernel
Step 3. After we installed kernel module, run ./apply_binaries.sh again.

Thanks for your help.

I am not sure if this error is really directly related to adding the LOCALVERSION. Even if you don’t have “-tegra” suffix, your “lsmod” still has the module loaded.

Anyway, glad that your resolved this issue.

Hi WayneWWW,

You are right. The issue is still there.

I found the file “Linux_for_Tegra/kernel/kernel_supplements.tbz2” will be unzip and cover all the kenel modules in directory “lib/modules/4.9.140-tegra” when I execute apply_binaries.sh. In other words, all the kernel modules will become Nvidia prebuild modules.
I finally figured out which kernel module cause this issue. If the nvgpu.ko is build by us, this issue will happen.
Is there anything different in nvgpu.ko between nvidia prebuild and ours ?

Hi Frank,

If you are using the same release(e.g rel-32.1 src for rel-32.1 release) and didn’t modify anything, then the module should be same. However, nvidia prebuild modules have been striped by below command so the size is extremely smaller than the one from you.

$ <tool_chain_path>/aarch64-linux-gnu-strip --strip-unneeded <path-of-kernel-module.ko>

Did you change anything in nvgpu driver or install any other adds-on in rootfs?

Hi WanyneWWW,
I strip the nvgpu.ko file which build by us and the issue is resolved.
I think this is the root cause. Thanks for your help!