This is a variation of an issue discussed on other threads, but I’ve not seen a solution, and my environment is a bit different as well.
I have a system with two M2070s and an on-board (non-nvidia) video adapter. There is no X server / GDM running. There are no monitors plugged into the M2070s.
Nvidia Driver: 280.13 Cuda Toolkit: 4.0.17 SDK: 4.0.17 Distro: RHEL 6.1
If I run a simple array copy test kernel, and specify cudaSetDevice(0); it seg faults. If I use cudaSetDevice(1) (or actually, any value >1) it runs fine. That seems to indicate that cudaSetDevice is using different identifiers than what is returned by nvidia-smi -L which gives:
GPU 0: Tesla M2070 GPU 1: Tesla M2070
It seems extremely non-intuitive, and not correct that cudaSetDevice would behave this way. Additionally, code examples compiled with no GPU set (and therefor running on GPU 0), as well as those which choose and set the GPU, such as the SDK matrixMul, either seg fault when cudaMalloc is called, or fail with “all CUDA-capable devices are busy or unavailable” such as:
[ matrixMul ] /usr/local/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/matrixMul Starting (CUDA and CUBLAS tests)... Device 0: "Tesla M2070" with Compute 2.0 capability Using Matrix Sizes: A(640 x 960), B(640 x 640), C(640 x 960) matrixMul.cu(151) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.
cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 280.13 Wed Jul 27 16:53:56 PDT 2011 GCC version: gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC)
The devices in /proc are:
cat /proc/driver/nvidia/gpus/0/information Model: Tesla M2070 IRQ: 24 Video BIOS: ??.??.??.??.?? Card Type: PCI-E DMA Size: 39 bits DMA Mask: 0x7fffffffff Bus Location: 0000:02.00.0 cat /proc/driver/nvidia/gpus/1/information Model: Tesla M2070 IRQ: 30 Video BIOS: ??.??.??.??.?? Card Type: PCI-E DMA Size: 39 bits DMA Mask: 0x7fffffffff Bus Location: 0000:03.00.0
So, that bottom line:
If nvidia-smi, and /proc, show the GPUs and 0 and 1, why do I have to specify 1 or 2 in cudaSetDevice() in order to choose a valid device? (Which also means many SDK examples fail to run as they default to device 0 when two identical devices are present.)
Any hints on how to straighten this out so that cudaSetDevice uses 0 and 1?