I have a very, very small cluster with one head node and one compute node. The compute node has
two V100 GPUs inside but the head node has no graphics card at all. I’m provisioning my system using Warewulf in a stateless compute node configuration. I’ve been trying to find a guide on how to
install the driver and the CUDA toolkit for a cluster situation but without much success. I have
found a mention in the documentation about 2 packages that seem to be related to cluster installation
but I cannot find those packages up for download nor the mentioned associated README file
Does anyone know where I could find these packages? Is there official support for cluster installation?
I would like to be able to launch GPU relates jobs from the head node to the compute node
both C++ and Python, I’m using slurm as a job scheduler. Does anyone have any recipe on
how to proceed to the installation? Any hints would be much appreciated.
The cluster packages are available on the download page in the tarball called “cluster(local)” – see screenshot below.
Once you extract the tarball, you should be able to find the packages: cuda-cluster-runtime-10-0_10.0.130-1 and cuda-cluster-devel-10-0_10.0.130-1
Is there official support for cluster installation? Does anyone have any recipe on how to proceed to the installation?
Yes – we officially support cluster packages. More details are available in the README.
For the version 9.2 of CUDA, unfortunately the package is not available for CentOS 7 which I am using for my cluster installation. It’s only available for RHEL 7 so I’m going to try to install that one. The README file inside that archive is not the one I was expected - not really a lot of clarifications as to what those packages do and how to install them in a cluster environment - head node & compute nodes.
I failed to see the supported distributions in the doc though I’m a bit surprised that there is support for Ubuntu and not CentOS(do people install Ubuntu on their clusters?)
(have no idea how to actually link images to this …??)