Failed to initialize the NVIDIA graphics device

My orin nano board information is below:
Model: NVIDIA Orin Nano Developer Kit - Jetpack 5.1.3 [L4T 35.5.0]
NV Power Mode[2]: 7W_CPU
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

  • 699-level Part Number: 699-13767-0004-300 N.2
  • P-Number: p3767-0004
  • Module: NVIDIA Jetson Orin Nano (4GB ram)
  • SoC: tegra23x
  • CUDA Arch BIN: 8.7
  • Codename: P3768
    Platform:
  • Machine: aarch64
  • System: Linux
  • Distribution: Ubuntu 20.04 focal
  • Release: 5.10.192-tegra
  • Python: 3.8.10
    jtop:
  • Version: 4.2.8
  • Service: Active
    Libraries:
  • CUDA: 11.4.315
  • cuDNN: 8.6.0.166
  • TensorRT: 8.5.2.2
  • VPI: 2.4.8
  • Vulkan: 1.3.204
  • OpenCV: 4.5.1 - with CUDA: YES

I did re-build kernel image and drivers, then copy kernel image and all ko files to Linux_for_Tegra for flash.

I saw this log: NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device! sometimes, not everytime, it happens about 20% - 30%.

I also striped nvgpu.ko file, it can’t help.

attach syslog log.

In the syslog, there are 2 failures, I copied below.

Jun 19 10:08:29 txkj-desktop /usr/lib/gdm3/gdm-x-session[1063]: (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!

Jun 19 10:23:32 txkj-desktop /usr/lib/gdm3/gdm-x-session[1039]: (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!

syslog_0619_102332.zip (262.1 KB)

Is there anyone help check this issue?

Are you talking about if you build kerenl image/modules from the source and it has chance to hit this issue sometimes?

Yes. It looks like after we build kernel image/modules from source, then we saw this issue somtimes.

Could you share the steps you are running to build kernel?

I follow this link to build kernel Kernel Customization — NVIDIA Jetson Linux Developer Guide 1 documentation.

But I did enhance the build script nvbuild.sh to build display driver, modules_install and strip ko files. I attached my changed nvbuld.sh file.

All built ko files are in kernel_supplements.tbz2, then I go to ${L4T_DIR}/rootfs folder, run below commands to remove all old ko files, then untar new ko files in kernel_supplements.tbz2.

cd ${L4T_DIR}/rootfs
rm -rf lib/modules/*
tar --keep-directory-symlink -I lbzip2 -xpmf ${KER_OUT}/mod_install/kernel_supplements.tbz2

Then I flash image to my device.

nvbuild.zip (2.1 KB)

When this error happened, will it go back to work again if you restart gdm3?

Also, this looks like log from custom board, are you sure NV devkit can reproduce this issue?

When this error happened, we saw the login screen showing up though we config with auto login. This is the only issue, we did not see other display issue, and did not restart gdm3.

Hi,

So the GUI is still there? Could you elaborate more about the symptom?
Are you still be able to login in this situation?

YES. The GUI is still there, and I can login in on UI. The only issue is that the login in screen should not appear since I config auto login.
I guess the login screen is caused by the failure of NVIDIA graphic device init.

Is this still an issue to support? Any result can be shared?

The issue is we dont want the login screen showing up. Our device is a special device which should not display ubuntu login screen to the user. We expect when the device is power on, our application UI is displayed on screen without any ubuntu login, the user can’t see any ubuntu UI/desktop.