Not able to initialize all GPU cards in Ubuntu 12.04

Hi !
First off let me thank all the people on this forum… your posts have been very helpful to me.
However now I seem to be stuck on a very difficult problem and cannot seem to solve it.
I am a former Windows user currently making the switch to open source for work.
I have :-
1.) Ubuntu 12.04 LTS
2.) Z87 OC Formula Asrock mobo
3.) 16 GB RAM
4.) NVIDIA GTX 660 Ti - 3 in number
5.) CUDA 6.0

When I use only one of the cards I can compile deviceQueryDrv and get it to run. However when I add the other 2 cards I get problems

[b]svamotive@svamotive-desktop:~/Desktop/NVIDIA_CUDA-6.0_Samples/1_Utilities/deviceQueryDrv$ ./deviceQueryDrv
./deviceQueryDrv Starting…

CUDA Device Query (Driver API) statically linked version
cuInit(0) returned 101
→ CUDA_ERROR_INVALID_DEVICE (device specified is not a valid CUDA device)
Result = FAIL[/b]

And when I run deviceQuery I get a problem also

[b]svamotive@svamotive-desktop:~/Desktop/NVIDIA_CUDA-6.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 10
→ invalid device ordinal
Result = FAIL
[/b]

Here are the details of my system

lspci | grep -i nvidia

01:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
02:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)

uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION=“Ubuntu 12.04.4 LTS”
NAME=“Ubuntu”
VERSION=“12.04.4 LTS, Precise Pangolin”
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME=“Ubuntu precise (12.04.4 LTS)”
VERSION_ID=“12.04”

gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

nvidia-smi
Unable to determine the device handle for GPU 0000:03:00.0: Unknown Error

nvidia-settings -q NvidiaDriverVersion

Attribute ‘NvidiaDriverVersion’ (svamotive-desktop:0.0): 331.20

I have also modified /etc/X11/xorg.conf to include all three Device sections for the different Cards.

I still cannot get the system to work with all 3 of the GPU cards

Since this is for my work… I am kinda in a bind. Any help would be appreciated.

Thanks
Sidarth

You’re probably asking a lot of that Asrock mobo to support 3 GPUs. A few suggestions, and referring to the specifications:

[url]http://www.asrock.com/mb/Intel/Z87%20OC%20Formulaac/?cat=Specifications[/url]

  1. make sure you have the latest BIOS installed on that motherboard
  2. check the BIOS configuration to see if there are any settings necessary to enable GPUs in the 3 slots PCIE1, PCIE2, and PCIE4. The basic enablement should be OK since they are all showing up in lspci, but there may be resource mapping settings that need to be changed.
  3. make sure your GPUs are plugged into the slots labelled PCIE1, PCIE2, and PCIE4. Do not use PCIE6. Even in this config, with 3 GPUs, one will be operating at x8 electrical and the other two will be operating at x4 electrical.
  4. make sure you have proper power delivery to each of the GPUs, including a beefy (I would recommend at least 850W) PSU and make sure any necessary aux power cables are properly plugged into theose GTX 660 Ti GPUs. I think that should be a total of six 6-Pin aux power connectors (two on each GPU) that must have a cable plugged into them:

[url]http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-660ti/specifications[/url]

If you still need help, I would run the following command as well, immediately after a reboot:

dmesg |grep NVRM

Thanks for the reply txbob

The output for dmesg | grep NVRM immediately after reboot is

[ 7.496540] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 331.20 Wed Oct 30 17:43:35 PDT 2013
[ 13.361600] NVRM: failed to copy vbios to system memory.
[ 13.364761] NVRM: RmInitAdapter failed! (0x30:0xffffffff:720)
[ 13.364771] NVRM: rm_init_adapter failed for device bearing minor number 2
[ 13.364787] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -5

I do have the latest BIOS installed on MOBO.
I have used the correct slots for GPUs , namely PCIE1, PCIE2 AND PCIE4. Currently I do not have anything on PCIE6.
The BIOS UEFI does not have any resource mapping settings.
I am familiar with the x8/x4/x4 configuration.
I am using a silverstone 1500 watt power supply. Have checked all the cables and they are all providing power to the GTX 660 Ti’s. All aux cables are plugged in.
As per my DD the Asrock MOBO selected can run with three GPUs. I am in India hence selection is a little limited but enough for my purposes.

Any additional suggestions always welcome

BTW ,
Can anyone here tell me what RmInitAdapter is ??? Cannot find any help pages on this .
Also when I run nvidia-smi I get

Unable to determine the device handle for GPU 0000:03:00.0: Unknown Error

What does all this mean. Nvidia-settings can detect 2 of my cards but not the third ??
Very perplexing !!

Some additional information
Have also tried swapping places for the NVIDIA GPUs and still getting the same problem.
The cards are working fine.

When I try nvidia-smi -L

GPU 0: GeForce GTX 660 Ti (UUID: GPU-78c2a7bf-82a2-e2de-ac5d-fc29605476f5)
GPU 1: GeForce GTX 660 Ti (UUID: GPU-36c282d8-2ded-c9b4-3cb2-0e0518a56be3)
Unable to determine the device handle for gpu 0000:03:00.0: Unknown Error

and yet once again

dmesg | grep NVRM

[ 8.776921] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 331.20 Wed Oct 30 17:43:35 PDT 2013
[ 17.035423] NVRM: failed to copy vbios to system memory.
[ 17.051352] NVRM: RmInitAdapter failed! (0x30:0xffffffff:720)
[ 17.051376] NVRM: rm_init_adapter failed for device bearing minor number 2
[ 17.051457] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -5
[ 62.839657] NVRM: failed to copy vbios to system memory.
[ 62.855871] NVRM: RmInitAdapter failed! (0x30:0xffffffff:720)
[ 62.855896] NVRM: rm_init_adapter failed for device bearing minor number 2
[ 62.855937] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -5

I would try a newer kernel and/or NVIDIA driver and see if the problem is resolved. Those error messages typically come up on Optimus laptops, but you have a desktop.

From NVIDIA driver common problems:
"On some notebooks with Optimus graphics, the NVIDIA driver may not be able to retrieve the Video BIOS due to interactions between the System BIOS and the Linux kernel’s ACPI subsystem. On affected notebooks, applications that require the GPU will fail, and messages like the following may appear in the system log:

NVRM: failed to copy vbios to system memory.
NVRM: RmInitAdapter failed! (0x30:0xffffffff:858)
NVRM: rm_init_adapter(0) failed
Such problems are typically beyond the control of the NVIDIA driver, which relies on proper cooperation of ACPI and the System BIOS to retrieve important information about the GPU, including the Video BIOS."

Did the same 3x 660Ti setup work on Windows? Another thing that is worth to try after you attempt a kernel/driver update… if Ubuntu was installed in UEFI mode, install it in MBR (old BIOS/CSM compatibility mode) instead. I’ve seen UEFI glitches on a previous MSI X79 motherboard I had with multiple video cards, but they didn’t show up with a legacy BIOS boot.

Thanks Vacaloca.
I did read that section in Chapter 8 of Nvidia Problems previously.
http://us.download.nvidia.com/XFree86/Linux-x86/169.04/README/chapter-08.html

I have not tried this in Windows 7 yet. I will probably try a reinstall with Ubuntu
in the manner you suggested. Can you direct me to where I can get instructions to do
the Ubuntu install in MBR mode ?? This is my first open source project !!
With my MOBO on startup, I only get the UEFI option on startup…
Furthermore if it is an ACPI problem communicating with the BIOS … is there a workaround ?
Thanks for your help

If there is a problem, the ‘workaround’ would be to update the kernel, assuming that is the issue… otherwise the ‘workaround’ would be a motherboard that works with 3 GPUs… I would try to see if you are able to make the 3 GPUs work in Windows first, if at all possible – that step ensures it’s not a hardware incompatibility, as if you cannot initialize the 3 GPUs in Windows, I wouldn’t bet on it happening on Linux.

In regards to installing Ubuntu in BIOS/MBR mode, you’re probably looking for an option on your BIOS settings that lets you boot from the CD or USB flash drive in Legacy/MBR mode (or anything that doesn’t say UEFI). Look at page 111 on your motherboard manual, assuming I have the right one. [url]ftp://66.226.78.21/manual/Z87%20OC%20Formula.pdf[/url], basically enable CSM, and boot with an option that does not say UEFI after you do that… presumably there will be one once you enable that option.

One other thing I would check, is if the three GTX 660 Ti cards all have identical BIOS versions. If not, updating them all to the same revision might or might not fix the issue… not sure if this comes into play at all, but throwing it out there just in case.

I figured I’d update this thread… it seems someone else ran into this issue and has a solution. See: [url]NVIDIA 331.20 will not load with kernel 3.13 on EFI without CSM - Linux - NVIDIA Developer Forums

I suspect this is the problem sidarth72 is having.

Is it working if you remove 1 card?