Titan driver installation not working with CUDA on Ubuntu

Hello,

We have a GPU workstation with two TITAN cards (90YV03Y0-U0NA00 ASUS GeForce). The rest of the hardware is given here (http://www.supermicro.com/products/system/4U/7047/SYS-7047GR-TRF.cfm). We are using ubuntu 12.04.

We installed nVidia drivers(319.37, 331 and 334) and CUDA 5.5 toolkit thrice with three different set of instructions. None of them worked actually.

When we run nvidia-smi command we get the following

NVIDIA: could not open the device file /dev/nvidia1 (Input/output error).
Unable to determine the device handle for GPU 0000:83:00.0: Unknown Error

what could be a possible workaround, can anybody help please?

Two possibilities come to mind:

  1. You are running the nvidia-smi command as a non-root user, and you have not properly set up the GPU device files. If so, running nvidia-smi as a root user should rectify.
  2. The nouveau driver is present on your system and interfering. If that is the case, running:

dmesg |grep NVRM
dmesg |grep nouv

may be instructive. There may be other possibilities as well.

I tried nvidia-smi as root. The output is the same. dmesg |grep nouv outputs nothing but dmesg |grep NVRM gives the following

[ 5.058975] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 319.37 Wed Jul 3 17:08:50 PDT 2013
[ 73.850392] NVRM: RmInitAdapter failed! (0x26:0xffffffff:1170)
[ 73.850424] NVRM: rm_init_adapter(1) failed
[24827.739031] NVRM: RmInitAdapter failed! (0x26:0xffffffff:1170)
[24827.739060] NVRM: rm_init_adapter(1) failed
[79517.991083] NVRM: RmInitAdapter failed! (0x26:0xffffffff:1170)
[79517.991112] NVRM: rm_init_adapter(1) failed
[249861.914044] NVRM: RmInitAdapter failed! (0x26:0xffffffff:1170)
[249861.914073] NVRM: rm_init_adapter(1) failed
[249871.312173] NVRM: RmInitAdapter failed! (0x26:0xffffffff:1170)
[249871.312199] NVRM: rm_init_adapter(1) failed
[249892.131472] NVRM: RmInitAdapter failed! (0x26:0xffffffff:1170)
[249892.131497] NVRM: rm_init_adapter(1) failed

I can see a lot of things are failing. The last time I basically tried this tutorial…

Is it ok? Do I need to do a fresh install from the beginning?

(Using the same instruction I managed to install cuda on a separate machine having a single NVS 315 card. But here as we have two titans could that be a problem? The output says nvidia1(input/output error) so is the second titan not working?)

Really I can’t give you a bulletproof set of instructions. The tutorial you linked looks mostly good, but if nouveau is part of the initrd image, it will not fix that. Likewise I have no idea if your motherboard may have compatibility issues with dual titans. And it would have been useful to know the exact sequence you followed as well as the output (e.g. from installer) at each step.

I guess I would start by suggesting that you remove one of the titan cards, preferably the one at 0000:83:00.0 (lspci may help), and see if the behavior is any different. No need to change any software before you do that.