Running a CUDA application on a GPU-less machine

Hello,

I’m developing an application for heterogeneous clusters with some GPU-enhanced nodes and other CPU-only nodes. Ideally, I would like to have a single executable run within an MPI context that checks for GPUs (cudaGetDeviceCount) and uses the GPU calls if present. I’m unable to do this however because I can’t install the nvidia driver on the nodes without compatible GPUs, and the application fails when it doesn’t find libcuda.so.

Is there a way to force the driver install (or a dummy driver of some sort?) without a compatible GPU? Any suggestions to avoid having to run different builds of the code on GPU and GPU-less nodes would be highly appreciated.

Thanks,
Shankar

I work on an heterogeneous cluster like the one you describe. The cuda driver is installed everywhere. You may ask your administrator for it.

Depending on the platform you are on, you could weakly link against the Cuda libraries so that execution does not fail if they are not present, or you can install the libraries in your home directory and adjust the the library search path.

I do this without the NVIDIA driver installed on every node. The trick is that you don’t link to libcuda.so, you only link to libcudart. The weird thing is that cudaGetDeviceCount will return 1 on the systems without the driver installed (this is a “feature” documented in the manual). You have to check the properties of each device to tell whether they are real or not. You can find the code that I use here: https://codeblue.umich.edu/hoomd-blue/trac/browser/trunk/libhoomd/data_structures/ExecutionConfiguration.cc - look for scanGPUs

Thanks a ton! That works beautifully for me.

Sincerely,

Shankar