Some of the nodes in our HPC cluster have GPUs. Since we’d like as much as possible for the hard drive contents of the compute nodes to be identical, we want to install CUDA on all the nodes.
Our users are instructed to request GPUs from our (Torque) queueing system when they want to use GPUs. So we shouldn’t have GPU-attempting codes running on our GPU-less nodes.
But I don’t know if there are codes which will attempt GPU use based solely on the presence of libcuda.so. We don’t want that, since we want all nodes to have that library whether or not they have GPUs. Has anyone heard of codes “misbehaving” by trying to use GPUs (and then aborting), due to the presence of libcuda, even when the codes can run without CUDA, and are not invoked specifically as GPU-aware?