Hello,
I recently replaced a Titan X board with a Titan V board in a computer running Ubuntu 16.04. Upon installing the latest CUDA Toolkit v9.1 with display driver 387.26, nvidia-smi returns, “No devices were found”. (CUDA 9.0, which was installed on the machine with the Titan X and worked, had the same result when I installed the Titan V.)
In case it’s relevant, the machine has an AST2400 BMC on it, and the primary display is set up to go out the VGA port on the BMC and not through the Nvidia GPU. The GPU is for compute only.
I found another thread with a similar situation some time ago, and the resolution was a driver update. ("RmInitAdapter failed" with 370.23 but 367.35 works fine - Linux - NVIDIA Developer Forums)
Any ideas on how to proceed?
Thanks,
Aaron
Relevant output from dmesg includes:
[ 6.755454] nvidia: module license ‘NVIDIA’ taints kernel.
[ 6.755455] Disabling lock debugging due to kernel taint
[ 6.761032] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 6.765634] ipmi_si IPI0001:00: Found new BMC (man_id: 0x000000, prod_id: 0xaabb, dev_id: 0x20)
[ 6.766213] nvidia-nvlink: Nvlink Core is being initialized, major device number 243
[ 6.766381] nvidia 0000:04:00.0: enabling device (0100 → 0103)
[ 6.766448] vgaarb: device changed decodes: PCI:0000:04:00.0,olddecodes=io+mem,decodes=none:owns=none
[ 6.766508] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 387.26 Thu Nov 2 21:20:16 PDT 2017 (using threaded interrupts)
[ 7.175641] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.2/0000:04:00.1/sound/card0/input2
[ 7.175685] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.2/0000:04:00.1/sound/card0/input3
[ 7.175735] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.2/0000:04:00.1/sound/card0/input4
[ 7.175769] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:02.2/0000:04:00.1/sound/card0/input5
[ 8.229133] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 242
[ 8.653254] NVRM: RmInitAdapter failed! (0x30:0x56:685)
[ 8.653280] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 17.811810] NVRM: RmInitAdapter failed! (0x30:0x56:685)
[ 17.811839] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 252.420234] NVRM: RmInitAdapter failed! (0x30:0x56:685)
[ 252.420254] NVRM: rm_init_adapter failed for device bearing minor number 0
Also relevant:
cat /proc/driver/nvidia/gpus/0000:02:00.0/information
Model: Graphics Device
IRQ: 57
GPU UUID: GPU-???-???-???-???-???
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:02:00.0
Device Minor: 0
Also:
uname -r
4.4.0-98-generic
Also:
sudo dmidecode
[sudo] password for agreenblatt:
dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 3.0 present.
36 structures occupying 2136 bytes.
Table at 0x000ED9B0.
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: American Megatrends Inc.
Version: P2.10
Release Date: 06/17/2016
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 8192 kB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 5.11
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: To Be Filled By O.E.M.
Product Name: To Be Filled By O.E.M.
Version: To Be Filled By O.E.M.
Serial Number: To Be Filled By O.E.M.
UUID: 00000000-0000-0000-0000-D05099C16889
Wake-up Type: Power Switch
SKU Number: To Be Filled By O.E.M.
Family: To Be Filled By O.E.M.
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: ASRockRack
Product Name: EPC612D8
Version:
Serial Number:
Asset Tag:
Features:
Board is a hosting board
Board is replaceable
Location In Chassis:
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
Handle 0x0003, DMI type 3, 22 bytes
Chassis Information
Manufacturer: To Be Filled By O.E.M.
Type: Desktop
Lock: Not Present
Version: To Be Filled By O.E.M.
Serial Number: To Be Filled By O.E.M.
Asset Tag: To Be Filled By O.E.M.
Boot-up State: Safe
Power Supply State: Safe
Thermal State: Safe
Security Status: None
OEM Information: 0x00000000
Height: Unspecified
Number Of Power Cords: 1
Contained Elements: 0
SKU Number: To Be Filled By O.E.M.
Handle 0x0004, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE1
Type: x8 PCI Express
Current Usage: In Use
Length: Long
ID: 17
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: ffff:04:1f.7
Handle 0x0005, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE3
Type: x16 PCI Express
Current Usage: Available
Length: Long
ID: 19
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: ffff:03:1f.7
Handle 0x0006, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE5
Type: x8 PCI Express
Current Usage: Available
Length: Long
ID: 21
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Handle 0x0007, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE6
Type: x8 PCI Express
Current Usage: Available
Length: Long
ID: 22
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: ffff:01:1f.7
Handle 0x0008, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE7
Type: x16 PCI Express
Current Usage: In Use
Length: Long
ID: 23
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: ffff:02:1f.7
Handle 0x0009, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE8
Type: x4 PCI Express
Current Usage: Available
Length: Long
ID: 33
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Handle 0x000A, DMI type 11, 5 bytes
OEM Strings
String 1: To Be Filled By O.E.M.
Handle 0x000B, DMI type 32, 20 bytes
System Boot Information
Status: No errors detected
Handle 0x000C, DMI type 15, 73 bytes
System Event Log
Area Length: 65535 bytes
Header Start Offset: 0x0000
Header Length: 16 bytes
Data Start Offset: 0x0010
Access Method: Memory-mapped physical 32-bit address
Access Address: 0xFF850000
Status: Valid, Not Full
Change Token: 0x00000203
Header Format: Type 1
Supported Log Type Descriptors: 25
Descriptor 1: Single-bit ECC memory error
Data Format 1: Multiple-event handle
Descriptor 2: Multi-bit ECC memory error
Data Format 2: Multiple-event handle
Descriptor 3: Parity memory error
Data Format 3: None
Descriptor 4: Bus timeout
Data Format 4: None
Descriptor 5: I/O channel block
Data Format 5: None
Descriptor 6: Software NMI
Data Format 6: None
Descriptor 7: POST memory resize
Data Format 7: None
Descriptor 8: POST error
Data Format 8: POST results bitmap
Descriptor 9: PCI parity error
Data Format 9: Multiple-event handle
Descriptor 10: PCI system error
Data Format 10: Multiple-event handle
Descriptor 11: CPU failure
Data Format 11: None
Descriptor 12: EISA failsafe timer timeout
Data Format 12: None
Descriptor 13: Correctable memory log disabled
Data Format 13: None
Descriptor 14: Logging disabled
Data Format 14: None
Descriptor 15: System limit exceeded
Data Format 15: None
Descriptor 16: Asynchronous hardware timer expired
Data Format 16: None
Descriptor 17: System configuration information
Data Format 17: None
Descriptor 18: Hard disk information
Data Format 18: None
Descriptor 19: System reconfigured
Data Format 19: None
Descriptor 20: Uncorrectable CPU-complex error
Data Format 20: None
Descriptor 21: Log area reset/cleared
Data Format 21: None
Descriptor 22: System boot
Data Format 22: None
Descriptor 23: End of log
Data Format 23: None
Descriptor 24: OEM-specific
Data Format 24: OEM-specific
Descriptor 25: OEM-specific
Data Format 25: OEM-specific
Handle 0x000D, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 256 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x000E, DMI type 19, 31 bytes
Memory Array Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x00FFFFFFFFF
Range Size: 64 GB
Physical Array Handle: 0x000D
Partition Width: 2
Handle 0x000F, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x000D
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 72 bits
Size: 32 GB
Form Factor: RIMM
Set: None
Locator: DIMM_A1
Bank Locator: NODE 1
Type: DDR4
Type Detail: Synchronous
Speed: 2400 MHz
Manufacturer: Undefined
Serial Number: EE0A7016
Asset Tag: DIMM_A1_AssetTag
Part Number: 9965640-006.A01G
Rank: 2
Configured Clock Speed: 2400 MHz
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0010, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x007FFFFFFFF
Range Size: 32 GB
Physical Device Handle: 0x000F
Memory Array Mapped Address Handle: 0x000E
Partition Row Position: 1
Handle 0x0011, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x000D
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: RIMM
Set: None
Locator: DIMM_A2
Bank Locator: NODE 1
Type: DDR4
Type Detail: Synchronous
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: NO DIMM
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0012, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x000D
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 72 bits
Size: 32 GB
Form Factor: RIMM
Set: None
Locator: DIMM_B1
Bank Locator: NODE 1
Type: DDR4
Type Detail: Synchronous
Speed: 2400 MHz
Manufacturer: Undefined
Serial Number: EF087482
Asset Tag: DIMM_B1_AssetTag
Part Number: 9965640-006.A01G
Rank: 2
Configured Clock Speed: 2400 MHz
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0013, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x00800000000
Ending Address: 0x00FFFFFFFFF
Range Size: 32 GB
Physical Device Handle: 0x0012
Memory Array Mapped Address Handle: 0x000E
Partition Row Position: 1
Handle 0x0014, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x000D
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: RIMM
Set: None
Locator: DIMM_B2
Bank Locator: NODE 1
Type: DDR4
Type Detail: Synchronous
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: NO DIMM
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0015, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 256 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x0016, DMI type 19, 31 bytes
Memory Array Mapped Address
Starting Address: 0x01000000000
Ending Address: 0x01FFFFFFFFF
Range Size: 64 GB
Physical Array Handle: 0x0015
Partition Width: 2
Handle 0x0017, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 72 bits
Size: 32 GB
Form Factor: RIMM
Set: None
Locator: DIMM_C1
Bank Locator: NODE 2
Type: DDR4
Type Detail: Synchronous
Speed: 2400 MHz
Manufacturer: Undefined
Serial Number: EB084E82
Asset Tag: DIMM_C1_AssetTag
Part Number: 9965640-006.A01G
Rank: 2
Configured Clock Speed: 2400 MHz
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0018, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x01000000000
Ending Address: 0x017FFFFFFFF
Range Size: 32 GB
Physical Device Handle: 0x0017
Memory Array Mapped Address Handle: 0x0016
Partition Row Position: 1
Handle 0x0019, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: RIMM
Set: None
Locator: DIMM_C2
Bank Locator: NODE 2
Type: DDR4
Type Detail: Synchronous
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: NO DIMM
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x001A, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 72 bits
Size: 32 GB
Form Factor: RIMM
Set: None
Locator: DIMM_D1
Bank Locator: NODE 2
Type: DDR4
Type Detail: Synchronous
Speed: 2400 MHz
Manufacturer: Undefined
Serial Number: E819480C
Asset Tag: DIMM_D1_AssetTag
Part Number: 9965640-006.A01G
Rank: 2
Configured Clock Speed: 2400 MHz
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x001B, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x01800000000
Ending Address: 0x01FFFFFFFFF
Range Size: 32 GB
Physical Device Handle: 0x001A
Memory Array Mapped Address Handle: 0x0016
Partition Row Position: 1
Handle 0x001C, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: RIMM
Set: None
Locator: DIMM_D2
Bank Locator: NODE 2
Type: DDR4
Type Detail: Synchronous
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: NO DIMM
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x001D, DMI type 7, 19 bytes
Cache Information
Socket Designation: CPU Internal L1
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 896 kB
Maximum Size: 896 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Parity
System Type: Other
Associativity: 8-way Set-associative
Handle 0x001E, DMI type 7, 19 bytes
Cache Information
Socket Designation: CPU Internal L2
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 3584 kB
Maximum Size: 3584 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Unified
Associativity: 8-way Set-associative
Handle 0x001F, DMI type 7, 19 bytes
Cache Information
Socket Designation: CPU Internal L3
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 35840 kB
Maximum Size: 35840 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Unified
Associativity: 20-way Set-associative
Handle 0x0020, DMI type 4, 42 bytes
Processor Information
Socket Designation: CPUSocket
Type: Central Processor
Family: Xeon
Manufacturer: Intel
ID: F1 06 04 00 FF FB EB BF
Signature: Type 0, Family 6, Model 79, Stepping 1
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
DS (Debug store)
ACPI (ACPI supported)
MMX (MMX technology supported)
FXSR (FXSAVE and FXSTOR instructions supported)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
SS (Self-snoop)
HTT (Multi-threading)
TM (Thermal monitor supported)
PBE (Pending break enabled)
Version: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Voltage: 0.0 V
External Clock: 100 MHz
Max Speed: 4000 MHz
Current Speed: 2400 MHz
Status: Populated, Enabled
Upgrade: Socket LGA2011-3
L1 Cache Handle: 0x001D
L2 Cache Handle: 0x001E
L3 Cache Handle: 0x001F
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Core Count: 14
Core Enabled: 14
Thread Count: 28
Characteristics:
64-bit capable
Multi-Core
Hardware Thread
Execute Protection
Enhanced Virtualization
Power/Performance Control
Handle 0x0021, DMI type 130, 20 bytes
OEM-specific Type
Header and Data:
82 14 21 00 24 41 4D 54 01 01 01 01 01 A5 2F 02
00 00 00 00
Handle 0x0022, DMI type 131, 64 bytes
OEM-specific Type
Header and Data:
83 40 22 00 35 00 00 00 09 00 00 00 00 00 1D 00
F8 00 44 8D 00 00 00 00 09 80 00 00 01 00 09 00
EA 03 25 00 00 00 00 00 C8 00 3A 15 00 00 00 00
00 00 00 00 22 00 00 00 76 50 72 6F 00 00 00 00
Handle 0x0023, DMI type 127, 4 bytes
End Of Table