I have a very, very small cluster with one head node and one compute node. The compute node has
two V100 GPUs inside but the head node has no graphics card at all. I’m provisioning my system using Warewulf in a stateless compute node configuration. I’ve been trying to find a guide on how to
install the driver and the CUDA toolkit for a cluster situation but without much success. I have
found a mention in the documentation about 2 packages that seem to be related to cluster installation
but I cannot find those packages up for download nor the mentioned associated README file
- cuda-cluster-runtime-9-2, cuda-cluster-devel-9-2:
Does anyone know where I could find these packages? Is there official support for cluster installation?
I would like to be able to launch GPU relates jobs from the head node to the compute node
- both C++ and Python, I’m using slurm as a job scheduler. Does anyone have any recipe on
how to proceed to the installation? Any hints would be much appreciated.