Unable to run XServer or other graphical applications on ARM-based development board with an RTX A2000

NVIDIA Open GPU Kernel Modules Version:
525.89.02

Operating System and Version:
Ubuntu Desktop 20.04

Kernel Version:
5.10.35, Built from source

On the latest release of the driver and open kernel modules, using an ARM-based development board using the NXP Layerscape LX2160A processor and a U-Boot bootloader, I am unable to run Xserver or any graphical application (such as the OpenGL-based NVIDIA CUDA samples) due to each application being unable to open the display. In the dmesg logs I can see several XID 56 errors after starting one of these applications until it eventually prints out an XID 31 error before the driver times out and fails.

There are other errors interspersed as well, such as NVRM _hypervisorDetection_HVM: CPUID is NOT supported! and NVRM gpuInitOptimusSettings_IMPL: SBIOS did not acknowledge cfg space owner change, which seems like they may indicate some incompatibility with hardware/firmware.

I am able to run other, non-graphical CUDA samples such as vectorAdd and matrixMul and they appear to run without issue, other than some (slightly different) errors in dmesg. nvidia-smi is also able to detect the card fine but running it also produces errors in dmesg.

I’ve tried the following fixes:

  • Building the modules on hardware (initially I built them into the kernel).
  • Patching LX2160A support into the kernel module (Will post link to this in comments)
  • Downgrade kernel modules/drivers to an older version (520.56.06)
  • Blacklisting nouveau
  • Setting NVreg_OpenRmEnableUnsupportedGpus to 1 in the kernel module options
  • Added nvidia, nvidia-drm, and nvidia-uvm to load on boot.

None of these fixes have resolved the issue and most print the same dmesg logs. Unfortunately, I’m unable to test the proprietary drivers as they don’t recognize my GPU.

Here’s an example output from the kernel logs running debug drivers when it unsuccessfully attempts to start Xorg:

[   20.748893] nvidia 0001:01:00.0: Direct firmware load for nvidia/525.89.02/gsp_log_tu10x.bin failed with error -2
[   20.748901] nvidia 0001:01:00.0: Falling back to sysfs fallback for: nvidia/525.89.02/gsp_log_tu10x.bin
[   20.750254] NVRM RmInitAdapter: Failed to load gsp_log_*.bin, no GSP-RM logs will be printed (non-fatal)
[   20.750419] NVRM _hypervisorDetection_HVM: CPUID is NOT supported!
[   22.658526] NVRM knvlinkCopyNvlinkDeviceInfo_IMPL: NVLink is unavailable
[   22.660414] NVRM gpuInitOptimusSettings_IMPL: SBIOS did not acknowledge cfg space owner change
[   22.679408] NVRM kbifClearConfigErrors_IMPL: PCI-E device AER errors pending (00200000):
[   22.679412] NVRM kbifClearConfigErrors_IMPL:      _AER_CORR_ADVISORY_NONFATAL
[   22.679415] NVRM kbifClearConfigErrors_IMPL: Clearing these errors..
[   22.693379] NVRM rmapiAllocWithSecInfo: allocation failed; status: Ran out of a critical resource, other than memory [NV_ERR_INSUFFICIENT_RESOURCES] (0x0000001a)
[   22.693384] NVRM rmapiAllocWithSecInfo: client:0xc1e00007 parent:0xcaf00000 object:0xcaf00002 class:0x2080
[   22.693392] NVRM nvCheckOkFailedNoLog: Check failed: Ran out of a critical resource, other than memory [NV_ERR_INSUFFICIENT_RESOURCES] (0x0000001A) returned from pRmApi->AllocWithHandle(pRmApi, hClientId, hDeviceId, hSecondary, NV20_SUBDEVICE_0, &nv2080AllocParams) @NVRM: GPU at PCI:0001:01:00: GPU-490e73c2-9c3e-8d1c-6630-64de178285c5
[   22.782248] NVRM: Xid (PCI:0001:01:00): 31, pid=957, name=Xorg, Ch 00000000, intr 00000000. MMU Fault: ENGINE HOST2 HUBCLIENT_ESC faulted @ 0xef_deadb000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[   22.810067] NVRM tlsIsrInit: TLS: Unnecessary tlsIsrInit() call at FFFFD57A0131ED74. Will stop reporting further violations.
[   22.886204] nvidia-modeset: Allocated GPU:0 (GPU-490e73c2-9c3e-8d1c-6630-64de178285c5) @ PCI:0001:01:00.0
[   22.904771] NVRM: Xid (PCI:0001:01:00): 56, pid='<unknown>', name=<unknown>, CMDre 00000000 00000000 00000000 00000001 00000000

After several XID 56 errors it prints out the following:

[   23.225822] NVRM rpcRmApiControl_GSP: GspRmControl failed: hClient=0xc1d00000; hObject=0x0001000d; cmd=0x00731341; paramsSize=0x00000030; paramsStatus=0x0000ffff; status=0x0000ffff
[   23.226024] NVRM rpcRmApiControl_GSP: GspRmControl failed: hClient=0xc1d00000; hObject=0x0001000d; cmd=0x00731341; paramsSize=0x00000030; paramsStatus=0x0000ffff; status=0x0000ffff
[   23.229198] nvidia-modeset: DP> HPD v1.1
[   23.234029] nvidia-modeset: DP-CONN> Edid read complete: Manuf Id: 0xf022, Name: HP ZR22w
[   23.234029]     
[   23.234415] nvidia-modeset: DP> Failed to enable multistream mode on current link
[   23.250906] nvidia-modeset: Found HDCP Bksv= 63 93 a0 ea c7
[   23.252764] NVRM rpcRmApiControl_GSP: GspRmControl failed: hClient=0xc1d00000; hObject=0x0001000d; cmd=0x00731341; paramsSize=0x00000030; paramsStatus=0x0000ffff; status=0x0000ffff
[   23.254941] nvidia-modeset: GPU:0: HdmiPacketLibrary: Initialize Success.
[   23.255181] nvidia-modeset: DPCONN> New device 
[   23.255193] nvidia-modeset: GPU:0: DP-4: new DisplayPort 1.1 device detected
[   23.255200] nvidia-modeset: GPU:0:   Connector:   DisplayPort
[   23.255205] nvidia-modeset: GPU:0:   Video:       yes
[   23.255211] nvidia-modeset: GPU:0:   Audio:       yes
[   23.255224] nvidia-modeset: DP-CONN> NotifyDetectComplete
[   23.266777] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.269191] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.270548] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.271666] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.272738] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.273765] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.274837] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.275906] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.276956] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.278034] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.279470] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.280637] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.281825] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.282990] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.285553] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.288420] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.291975] nvidia-modeset: DP-GRP> Deleted group 0xab6a9830 from inactive group!
[   23.294369] NVRM: Xid (PCI:0001:01:00): 56, pid='<unknown>', name=<unknown>, CMDre 00000007 00000588 f40d9ee3 00000004 00800001
[   23.295015] nvidia-modeset: DP-GRP> Deleted group 0xab6a9030 from inactive group!
[   23.295133] NVRM: Xid (PCI:0001:01:00): 56, pid='<unknown>', name=<unknown>, CMDre 00000007 00000000 00000000 00000001 00800001
[   23.297390] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.298949] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.300488] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.302028] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.303593] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.307099] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.310075] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.312716] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.314616] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.315992] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.317529] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.318957] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.320442] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.321854] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.323311] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.326350] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.329311] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.332434] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.334508] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.336051] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.337624] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.339337] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.341057] nvidia-modeset: DP-GRP> Deleted group 0xab1ed630 from inactive group!
[   23.466771] NVRM: Xid (PCI:0001:01:00): 31, pid=957, name=Xorg, Ch 00000002, intr 00000000. MMU Fault: ENGINE HOST0 HUBCLIENT_ESC faulted @ 0xe7_008a1000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[   29.353312] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[   29.354261] nvidia-modeset: DP-GRP> Deleted group 0x92163230 from inactive group!
[   29.354266] nvidia-modeset: DP-GRP> Deleted group 0x92163030 from inactive group!
[   29.354269] nvidia-modeset: DP-GRP> Deleted group 0x97e7fe30 from inactive group!
[   29.354272] nvidia-modeset: DP-GRP> Deleted group 0x97e7fc30 from inactive group!
[   29.354275] nvidia-modeset: DP-GRP> Deleted group 0x97e7fa30 from inactive group!
[   29.354278] nvidia-modeset: DP-GRP> Deleted group 0x97e7f830 from inactive group!
[   29.354280] nvidia-modeset: DP-GRP> Deleted group 0x97e7f630 from inactive group!
[   29.354283] nvidia-modeset: DP-GRP> Deleted group 0x97e7f430 from inactive group!
[   29.354287] nvidia-modeset: DP-GRP> Deleted group 0x97e7f030 from inactive group!
[   29.354290] nvidia-modeset: DP-GRP> Deleted group 0x97e7f230 from inactive group!
[   29.354292] nvidia-modeset: DP-GRP> Deleted group 0x97d01a30 from inactive group!
[   29.354295] nvidia-modeset: DP-GRP> Deleted group 0x97d01830 from inactive group!
[   29.354299] nvidia-modeset: DP-GRP> Deleted group 0x97d01630 from inactive group!
[   29.354301] nvidia-modeset: DP-GRP> Deleted group 0x97d01430 from inactive group!
[   29.354304] nvidia-modeset: DP-GRP> Deleted group 0x97d01230 from inactive group!
[   29.354307] nvidia-modeset: DP-GRP> Deleted group 0x97d01030 from inactive group!
[   29.368323] nvidia-modeset: GPU:0: HdmiPacketLibrary: Destroy.
[   29.370359] nvidia-modeset: Freed GPU:0 (GPU-490e73c2-9c3e-8d1c-6630-64de178285c5) @ PCI:0001:01:00.0
[   31.376023] NVRM threadStateYieldCpuIfNecessary: Yielding
[   33.368024] NVRM _threadNodeCheckTimeout: _threadNodeCheckTimeout: currentTime: 3d08c1df18ba00 >= 3d08c1df18ba00
[   33.368030] NVRM _threadNodeCheckTimeout: _threadNodeCheckTimeout: Timeout was set to: 4000 msecs!
[   33.368038] NVRM scrubberDestruct:  Timed out when waiting for the scrub to complete the pending work .
[   33.368041] NVRM scrubberDestruct: bp @ src/kernel/gpu/mem_mgr/mem_scrub.c:350

Below is the bug report log:
nvidia-bug-report.log.gz (254.1 KB)

This is the patch
used to add support for the LX2160.

For the open kernel modules, rather use the github issue tracker:
https://github.com/NVIDIA/open-gpu-kernel-modules/issues

I’ve posted an issue to the open kernel modules repository, just hoping to get more visibility by sharing here as well.