CUDA and driver installation on a small cluster

cetatzeanum · September 20, 2018, 9:43am

I have a very, very small cluster with one head node and one compute node. The compute node has
two V100 GPUs inside but the head node has no graphics card at all. I’m provisioning my system using Warewulf in a stateless compute node configuration. I’ve been trying to find a guide on how to
install the driver and the CUDA toolkit for a cluster situation but without much success. I have
found a mention in the documentation about 2 packages that seem to be related to cluster installation
but I cannot find those packages up for download nor the mentioned associated README file

cuda-cluster-runtime-9-2, cuda-cluster-devel-9-2:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#cluster

Does anyone know where I could find these packages? Is there official support for cluster installation?

I would like to be able to launch GPU relates jobs from the head node to the compute node

both C++ and Python, I’m using slurm as a job scheduler. Does anyone have any recipe on
how to proceed to the installation? Any hints would be much appreciated.

Thank you

cetatzeanum · September 24, 2018, 8:27am

Bump

Anybody? Really, seriously, nobody from NVIDIA knows about these packages? They are mentioned in the documentation …

I don’t need somebody to hold my hand in installing these, just tell me where to get the packages from.

TomNVIDIA · September 25, 2018, 8:06pm

Hello,

I have forwarded your questions to the CUDA team. Please stay tuned for a reply.

Thanks,
Tom

TomNVIDIA · September 25, 2018, 10:04pm

The cluster packages are available on the download page in the tarball called “cluster(local)” – see screenshot below.

Once you extract the tarball, you should be able to find the packages: cuda-cluster-runtime-10-0_10.0.130-1 and cuda-cluster-devel-10-0_10.0.130-1

Is there official support for cluster installation? Does anyone have any recipe on how to proceed to the installation?
Yes – we officially support cluster packages. More details are available in the README.

External Media

Hope this helps.

Cheers,
Tom

cetatzeanum · September 27, 2018, 8:13am

Thank you for the reply and the clarifications.

For the version 9.2 of CUDA, unfortunately the package is not available for CentOS 7 which I am using for my cluster installation. It’s only available for RHEL 7 so I’m going to try to install that one. The README file inside that archive is not the one I was expected - not really a lot of clarifications as to what those packages do and how to install them in a cluster environment - head node & compute nodes.

I failed to see the supported distributions in the doc though I’m a bit surprised that there is support for Ubuntu and not CentOS(do people install Ubuntu on their clusters?)

(have no idea how to actually link images to this …??)

TomNVIDIA · September 27, 2018, 7:23pm

I have forwarded your question to the Product Manager.

I am not sure why the images did not load.
What format were these two files?

Thanks,
Tom

jmorales · November 10, 2018, 6:10am

Hi tom!, thanks for the info…
when the image will be loaded for centOS 7?

Topic		Replies	Views
About to start installing CUDA -- need to clarify a few questions CUDA Setup and Installation	15	1577	July 12, 2019
Question on installing CUDA in CentOS7 CUDA Setup and Installation	2	3559	November 23, 2015
how to install cuda in debian 9 stretch? CUDA Setup and Installation	7	13726	February 11, 2021
CUDA compatibility packages ? CUDA Setup and Installation	4	4108	October 19, 2018
Understanding datacenter GPU device driver installation CUDA Setup and Installation linux-driver-solutions	0	770	September 6, 2021
support of centos 7? CUDA Programming and Performance	8	5512	February 13, 2015
Cardless CUDA install for compliation. CUDA Setup and Installation	3	439	January 9, 2018
Installing CUDA on Ubuntu 22.04 (RXT4080-laptop) Drivers - Linux, Windows, MacOS cuda	4	5212	July 3, 2024
Centos7 failed to install CUDA10.2 ？ TensorRT	5	1927	August 24, 2020
Install CUDA on Geforce 8600GT CUDA Setup and Installation	3	3452	April 18, 2014

CUDA and driver installation on a small cluster

Related topics