X failed to initialize Nvidia GPU on a headless machine

I’m trying to run Xorg server on a remote Amazon ec2 server. The server runs Ubuntu with kernel 4.15 and has the nvidia driver installed.

This is the output of lspci -vnn | egrep 'VGA|3D':

00:01.3 Non-VGA unclassified device [0000]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 08)
00:03.0 VGA compatible controller [0300]: Amazon.com, Inc. Device [1d0f:1111] (prog-if 00 [VGA controller])
00:1e.0 3D controller [0302]: NVIDIA Corporation Device [10de:1eb8] (rev a1)

and this is the relevant section of the X server log:

[  6902.027] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:0:30:0.  Please
[  6902.027] (EE) NVIDIA(GPU-0):     check your system's kernel log for additional error
[  6902.027] (EE) NVIDIA(GPU-0):     messages and refer to Chapter 8: Common Problems in the
[  6902.027] (EE) NVIDIA(GPU-0):     README for additional information.
[  6902.027] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!

My xorg.conf is

Section "Module"
    Load "modesetting"

Section "ServerLayout"
    Identifier     "X.org Configured"
    Screen      0  "Screen0" 0 0
#    InputDevice    "Mouse0" "CorePointer"
#    InputDevice    "Keyboard0" "CoreKeyboard"

Section "Screen"
    Identifier "Screen0"
    Device     "Card0"
    Monitor    "Monitor0"
    SubSection "Display"
        Viewport   0 0
        Depth     1

Section "Device"
    Identifier  "Card0"
    Driver      "nvidia"
    BusID       "PCI:0:30:0"
    Option "AllowEmptyInitialConfiguration"

Section "Monitor"
    Identifier   "Monitor0"
    VendorName   "Monitor Vendor"
    ModelName    "Monitor Model"
    Option   "IgnoreEDID"

dmesg shows:

[ 8420.948545] NVRM: GPU 0000:00:1e.0: RmInitAdapter failed! (0x26:0xffff:1155)
[ 8420.949115] NVRM: GPU 0000:00:1e.0: rm_init_adapter failed, device minor number 0

Any ideas on how can I solve this problem?

nvidia-bug-report.log.gz (504 KB)

The error message points to either nouveau not being properly blacklisted or a faulty device. Please uninstall the .run installer driver using the --uninstall option, then install it from Ubuntu repo
sudo apt install nvidia-driver-430
and reboot. If the rminitadapter error persists, contact Amazon support about a faulty device.

Thanks. However, I’ve used the Ubuntu package in the first place - never run the .run installer. I can, though, uninstall all nvidia packages then install them again.

It appears that the card on this machine is Tesla T4. I downloaded and installed the .run from Nvidia, they recommend version 418. After the installation the card works fine. Problem solved.