CUDA without supported GPU

I am looking into using CUDA for open-source software I develop for. I want to know what happens to sections of code that use CUDA if the video card on the system is not CUDA-compatible. Will the code run on the CPU instead of the GPU? What is the “standard” way of designing applications that can handle running portions of code or threads on the GPU if possible, or the CPU if necessary? I see there is a “cuDeviceGetCount(&devCount)” function, so do developers just do:

if (devCount > 0)
//do CUDA code
else
//do CPU code

Thanks for the responses!

When using Runtime API you can check if there’s actual CUDA device by quering each device’s compute capability. AFAIR with Runtime API there’s always ‘emulation’ device (haven’t used Runtime API for a year, so things may be a little different now).

With Driver API things are little different. You can’t just do as you suggested becuase this requires cuDeviceGetCount() which is exported from nvcuda.dll which is part of the driver. If you run this code on machine without NVIDIA driver your program won’t even load (Loader won’t be able to set up import table). In this case you need to do delayed loading of nvcuda.dll. I’ve described how to do it several times, so try searching this forum.

You can compile the code in “emulation” mode with nvcc, but it will be very, very slow. Emulation mode is designed for host-side debugging, not a fallback code path for non-CUDA systems, so it is not very efficient at all.

The standard way to handle CUDA and non-CUDA systems is to write two separate versions of the compute-intensive part of your program and choose between them possibly in the way you describe. The user actively enables CUDA using a configuration option at compile-time in the programs I’ve written, so I’m not sure what the best approach is for runtime selection. Hopefully others can comment on that.

It is rumored that the next release of nvcc will allow CUDA kernels to be compiled down to efficient multithreaded/SSE-enabled CPU code. In that case, the CUDA implementation of an algorithm could easily be the source for both fast GPU and non-GPU binaries. (NVIDIA employees are generally not allowed to comment on the dates or features in future releases, so don’t even bother asking. :) )