Installing CUDA 10 on Ubuntu 18 with Titan X card

I am trying to setup a server so that I can use the Titan X card that is installed currently to do transcoding on video streams. I need to be able to monitor the GPU resources in use which led me to the nvidia-smi cmd which requires the NVidia CUDA software installed. I have been struggling with getting CUDA to recognize the card or atleast it appears to be having a problem.

I am following the instructions here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

Once complete I run nvidia-smi but I am getting an error saying “no devices found”

I am not able to find similar issues through googling, other than possible Nouveau but these dont appear to be enabled on my system.

Please let me know what further information you may to help me diagnose.

lspci output

03:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GM200 High Definition Audio (rev a1)

output from: cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module 410.48 Thu Sep 6 06:36:33 CDT 2018
GCC version: gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)

output from: nvcc --version

nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Sometimes just a reboot is needed. Have you rebooted after the installation?

In the link you indicate, there are two separate sets of instructions, one for runfile installation and one for package manager installation. It would be important to know which one you did.

what is the output of:

dmesg |grep NVRM

?

Thank you for getting back to me.

I have rebooted since the install. I also forgot to mention that this is my second attempt. First attempt was on 14.04 ubuntu. I used the package manager installation for both.

Here is the dmesg output:
[ 1.792664] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 410.48 Thu Sep 6 06:36:33 CDT 2018 (using threaded interrupts)
[ 7.980758] NVRM: failed to copy vbios to system memory.
[ 7.980918] NVRM: RmInitAdapter failed! (0x30:0xffff:664)
[ 7.980954] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 7045.323851] NVRM: failed to copy vbios to system memory.
[ 7045.324169] NVRM: RmInitAdapter failed! (0x30:0xffff:664)
[ 7045.324234] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 7228.436250] NVRM: failed to copy vbios to system memory.
[ 7228.436553] NVRM: RmInitAdapter failed! (0x30:0xffff:664)
[ 7228.436617] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 9001.353285] NVRM: failed to copy vbios to system memory.
[ 9001.353599] NVRM: RmInitAdapter failed! (0x30:0xffff:664)
[ 9001.353663] NVRM: rm_init_adapter failed for device bearing minor number 0
[10190.860590] NVRM: failed to copy vbios to system memory.
[10190.860992] NVRM: RmInitAdapter failed! (0x30:0xffff:664)
[10190.861062] NVRM: rm_init_adapter failed for device bearing minor number 0

Those are obviously errors (and I should have thought to look there). I will look into these while I await your response. Maybe this is an easy fix that I am missing?

If you did a fresh install of CUDA 10, on a fresh load of Ubuntu 18, with a Titan X card, that should have a very high probability of success.

On the other hand, the errors being reported in the system log are unusual and suggest to me an improper hardware install, or a fundamental incompatibility between the GPU and the system it is installed in.

Are all external power connections to the GPU properly connected? Is the GPU overheating? Is the GPU properly plugged into the PCIE slot?

If that all looks good, the next step in deduction would be to look at the output of:

sudo lspci -vvvv |grep -B 30 -i nvidia

to see if resources are properly assigned to that GPU by the system BIOS.