What is the best practice for detecting CUDA support

I have some code that will run on a cluster/grid. Some of the Condor hosts on the grid have Tesla cards and CUDA 2.3 installed and the rest don’t.
So when condor jobs are launched, my code needs to diagnose if the grid host it was sent to had a CUDA card and the CUDA toolkit installed or not.
On local hosts that support CUDA the code will use CUFFT for FFT calculations else I have it drop back to using FFTW. We didn’t want to do something
hard to maintain like key CUDA support into knowing the hostname of the grid host and looking up in some table to find if it had a CUDA installation.

Am I correct that the way to do this is to use cudaGetDeviceCount() :

cudaError_t rc;
int count = 0;
rc = cudaGetDeviceCount(&count);

and if the ‘rc’ return code isn’t cudaSuccess or the count == 0
then I can be assured there’s no CUDA capability (either due to no CUDA card or no toolkit offering the necessary shared libraries).
I don’t see any need to go into finding what exact compute capability rating the cards have, we only have Tesla C1060s or else the
host will have a plain old low end video card that servers usually come with (no CUDA at all) so I’m just needing to decide to use
CUDA or not.

I got the above from the deviceQuery.cpp file in the SDK

Mark

When no GPU is found, cudaGetDeviceCount() actually gives you 1 emulation device.

You will need to check the properties of the returned device, probably the major and minor number.

Instead of 1.x, the emulation device’s major and minor numbers are something like 99.99

If the hosts don’t have CUDA at all, then I doubt that cudaGetDeviceCount is going to work… you should probably try running a very simple CUDA program first (i.e. just cudaMalloc/cudaFree one byte). If it works, then call the CUDA binary, if it dies hideously then run the normal one.

According to MrAnderson42 it does work fine. He described what HOOMD does here if you are interested.

This is too bad because NVIDIA’s own SDK code is showing it wrong and that’s what I was using

as an example of the cuda detection:

NVIDIA_GPU_Computing_SDK/C/src/deviceQuery/deviceQuery.cpp

Indeed, that is incorrect.

The documentation in the cuda reference manual correctly describes the behavior of cudaGetDeviceCount, however.

I adapted MrAnderson’s HOOMD code to my problem, it worked well. I used a JNI call of my own to diagnose if CUDA and Tesla are installed on

the current local host on the grid and only instantiate the JCuda object if CUDA is available. If not I use JFFTW3 for my FFTs.

Mark