This is a variation of an issue discussed on other threads, but I’ve not seen a solution, and my environment is a bit different as well.
I have a system with two M2070s and an on-board (non-nvidia) video adapter. There is no X server / GDM running. There are no monitors plugged into the M2070s.
Nvidia Driver: 280.13
Cuda Toolkit: 4.0.17
SDK: 4.0.17
Distro: RHEL 6.1
If I run a simple array copy test kernel, and specify cudaSetDevice(0); it seg faults. If I use cudaSetDevice(1) (or actually, any value >1) it runs fine. That seems to indicate that cudaSetDevice is using different identifiers than what is returned by nvidia-smi -L which gives:
GPU 0: Tesla M2070
GPU 1: Tesla M2070
It seems extremely non-intuitive, and not correct that cudaSetDevice would behave this way. Additionally, code examples compiled with no GPU set (and therefor running on GPU 0), as well as those which choose and set the GPU, such as the SDK matrixMul, either seg fault when cudaMalloc is called, or fail with “all CUDA-capable devices are busy or unavailable” such as:
Example:
[ matrixMul ]
/usr/local/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/matrixMul Starting (CUDA and CUBLAS tests)...
Device 0: "Tesla M2070" with Compute 2.0 capability
Using Matrix Sizes: A(640 x 960), B(640 x 640), C(640 x 960)
matrixMul.cu(151) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.
Driver
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 280.13 Wed Jul 27 16:53:56 PDT 2011
GCC version: gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC)
The devices in /proc are:
cat /proc/driver/nvidia/gpus/0/information
Model: Tesla M2070
IRQ: 24
Video BIOS: ??.??.??.??.??
Card Type: PCI-E
DMA Size: 39 bits
DMA Mask: 0x7fffffffff
Bus Location: 0000:02.00.0
cat /proc/driver/nvidia/gpus/1/information
Model: Tesla M2070
IRQ: 30
Video BIOS: ??.??.??.??.??
Card Type: PCI-E
DMA Size: 39 bits
DMA Mask: 0x7fffffffff
Bus Location: 0000:03.00.0
So, that bottom line:
If nvidia-smi, and /proc, show the GPUs and 0 and 1, why do I have to specify 1 or 2 in cudaSetDevice() in order to choose a valid device? (Which also means many SDK examples fail to run as they default to device 0 when two identical devices are present.)
Any hints on how to straighten this out so that cudaSetDevice uses 0 and 1?
Thank you,
Pete