Problem running CUDA 3.1 SDK examples: cudaSafeCall() Runtime API error : all CUDA-capable devices a

alf.whitehead · July 14, 2010, 7:27pm

Hi all,

I’ve got an x86_64 CentOS 4.8 (RHEL-compatible) machine connected to half (2 GPUs) of an S1070. I previously had CUDA 2.3 installed and working fine, and today I installed CUDA 3.1.

Everything compiles fine, and my previously compiled CUDA 2.3 programs continue to work.

When trying the examples:

[codebox]# /usr/local/cudasdk31/C/bin/linux/release/clock

cudaSafeCall() Runtime API error : all CUDA-capable devices are busy or unavailable.[/codebox]

I then tried deviceQuery:

[codebox]# /usr/local/cudasdk31/C/bin/linux/release/deviceQuery

[/codebox]

… returns nothing, and terminates when I press enter.

deviceQueryDrv, on the other hand, gives:

[codebox]# /usr/local/cudasdk31/C/bin/linux/release/deviceQueryDrv

CUDA Device Query (Driver API) statically linked version

There are 2 devices supporting CUDA

Device 0: “Tesla T10 Processor”

CUDA Driver Version: 3.10

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 4294770688 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 256 bytes

Clock rate: 1.30 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Concurrent kernel execution: No

Device has ECC support enabled: No

Device 1: “Tesla T10 Processor”

CUDA Driver Version: 3.10

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 4294770688 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 256 bytes

Clock rate: 1.30 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Concurrent kernel execution: No

Device has ECC support enabled: No

PASSED

Press ENTER to exit…

[/codebox]

It’s totally weird. I’ve looked into the common things, such as rebooting the system, checking the permissions on /dev/nvidia* (all are a+rw), and the compute-exclusivity of the GPUs:

[codebox]# ls -al /dev

total 0

[…snip…]

crw-rw-rw- 1 root root 195, 0 Jul 14 15:28 nvidia0

crw-rw-rw- 1 root root 195, 1 Jul 14 15:28 nvidia1

crw-rw-rw- 1 root root 195, 2 Jul 14 15:28 nvidia2

crw-rw-rw- 1 root root 195, 3 Jul 14 15:28 nvidia3

crw-rw-rw- 1 root root 195, 4 Jul 14 15:28 nvidia4

crw-rw-rw- 1 root root 195, 255 Jul 14 15:28 nvidiactl

[…snip]

nvidia-smi -s

COMPUTE mode rules for GPU 0: 0

COMPUTE mode rules for GPU 1: 0

[/codebox]

Any ideas on what may cause this behaviour? Your help is much appreciated in advance.

– Alf

avidday · July 14, 2010, 7:57pm

What driver are you running? It is quite possible that if you didn’t upgrade to a 256 series driver at the same time, that you are seeing a runtime API version – driver version conflict (which would explain why your older code still works and the driver API code also still works). Usually the error messages are slightly different to that, though, but it might still be worth checking.

alf.whitehead · July 14, 2010, 8:04pm

Driver version is 256.35.

alorenz · July 17, 2010, 1:51am

I had a similar problem. I end up reinstalling the 64 bit driver. Make sure to purge all. I think the problem had to do with initially trying to use a repository and upgrading the driver from there, and that confused the os…( 64 bit Lucid).

alf.whitehead · July 18, 2010, 6:10pm

Thanks for the great idea. I fully uninstalled the Nvidia driver (using /usr/bin/nvidia-uninstall) and then re-installed it. My deviceQuery call now works!

Unfortunately, running the “clock” program still reports that no CUDA devices are available. I double-checked that both devices are in compute mode 0, and that no other CUDA apps are running on the system.

Any ideas?

alf.whitehead · August 2, 2010, 4:27pm

I tried out version 256.44 today, but got the same problem. DeviceQuery and deviceQueryDrv can both see the GPUs on the S1070, but when I try “clock” or something like it, I get that no GPUs are available.

When I revert to version 190.53 of the driver and try the sample programs in the v2.3 SDK everything works.

Topic		Replies	Views
invalid device function, all CUDA-capable devices are busy or unavailable CUDA Programming and Performance	5	7744	July 6, 2013
CUDA deviceQuery does not work but deviceQueryDrv does CUDA Setup and Installation	2	1784	August 5, 2021
Problems when installing CUDA 4.0 CUDA Programming and Performance	5	636	March 16, 2011
CUDA error, bandwithTest.exe CUDA Setup and Installation	12	2467	January 21, 2019
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	64100	April 20, 2011
All CUDA-capable devices are busy or unavailable Tesla V100 Accelerated Computing cuda	0	827	December 28, 2020
"RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable" on Ubuntu 16.04 CUDA Programming and Performance	0	1105	January 24, 2020
cudaGetDeviceCount error 3 (cudaErrorInitializationError) CUDA Programming and Performance	4	3174	March 22, 2021
deviceQuery passes but other demos fail CUDA Programming and Performance	7	2516	January 22, 2009
CUDA 4.0 Runtime API is not working while Device API is working CUDA Programming and Performance	7	3976	June 3, 2011

Problem running CUDA 3.1 SDK examples: cudaSafeCall() Runtime API error : all CUDA-capable devices a

nvidia-smi -s

Related topics