Installing HPC SDK on Ubuntu 20.04 - HPC SDK repo vs CUDA Toolkit repo vs Docker repo

I am curious what the best practice approach is for installing an OpenACC capable development environment in Ubuntu 20.04, given the multiple repos and installation methods that Nvidia provides…

Part one is my workstation.
Since I want OpenACC via nvc/nvc++ I need the HPC SDK, which comes with an apt repository option that is described in the download page here (NVIDIA HPC SDK Current Release Downloads | NVIDIA Developer). That seems convenient, but is not mentioned in the Documentation itself, where the installation guide describes a more generic scripted approach (Installation Guide Version 22.7). Which is a bit inconsistent? Depending on what specific search terms I use, I may end up on one page or the other and do a rather involved and customizable installation method for cluster admins as a developer…

Going the repo way and installing nvhpc-22-7 I’m also getting a 11.7 CUDA toolkit. Is there a need to additionally install the toolkit via the Repo available here (CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer)? Do I even need this repo configured for my system at all? I have to admit I did this a while ago and am not sure which repo is providing the driver.

Then I would like to run cuda applications in docker containers. For this I followed the Installation guide here (Installation Guide — NVIDIA Cloud Native Technologies documentation), which installs yet another repo for just this one package.

Is there a specific reason why this is split into three repos? For me as a user I have to say it would be a lot easier to just set up one repo from Nvidia and then install the according packages. Might be simpler, smaller, more robust, also, since I think there is some overlap between the HPC SDK and toolkit repos?

Part two is how to set up a docker container with the HPC SDK compilers.
Is it even a good idea to use the 11.7.1-devel-ubuntu20.04 docker container from Nvidia as a base and add the nvhpc-22-7 package from the according repo, since that also installs a cuda toolkit? Or could I just take a stock Ubuntu 20.04 docker, do the same and end up with a smaller image?

Some guidance would be highly appreciated.

Hi Cassfalg,

Yes, we do support multiple methods to install the NV HPC SDK. This is largely driven by user requests given no one method is best for all users.

Section 1.2 of the Install Guide is primarily for those installing from the tarball. This can be skipped for the other methods since the installation is implicit in these cases. Would adding a line indicating that this section can be skipped for the yum, zypper, and apt-get installations be sufficient to help clarify this?

Is there a need to additionally install the toolkit via the Repo available here

No, the NV HPC SDK is self contained and includes the dependent CUDA version(s). No need to install the CUDA SDK separately.

Note that there are two different packages, one with only the latest CUDA version, and a second that contains the latest, last major release, and the last version of the previous major release (in 22.7 this is CUDA versions 11.7, 11.0, and 10.2). All other components are the same. The additional CUDA versions increase the package size so, per user requests, we opted to offer both.

Is there a specific reason why this is split into three repos?

Different use cases for different users.

Or could I just take a stock Ubuntu 20.04 docker, do the same and end up with a smaller image?

We do provide containers for the HPC SDK as well, which might be your best option. There’s also images that only include the runtime libraries so you don’t need to download an image that includes all the development tools if you’re just running a pre-build application.

We also have a guide on using and creating containers:

Hope this helps,
Mat

Yes. Just adding the info that those methods exist at all would add a lot of value I think.

Does it include things like the driver as well? So I don’t need the toolkit repo at all normally? If yes I think this should also be added to the HPC SDK documentation. Maybe also to the Toolkit installation. I somehow thought I need to install the toolkit and driver as a prerequisite to the HPC SDK.

Isn’t that easier to address by different packages in the same repo? For the networked install at least. Then as a user I add the one repo and install what I need for my use case and that pulls everything else automatically?

This is for packaging applications into containers to run them in the cloud for example?

Since the CUDA driver is dependent on the specific device and installed OS, it must be downloaded and installed separately.

No, you do not need to separately install the CUDA toolkit. The necessary components from the supported CUDA toolkit(s) are are already included with the HPC SDK.

Isn’t that easier to address by different packages in the same repo? For the networked install at least. Then as a user I add the one repo and install what I need for my use case and that pulls everything else automatically?

The CUDA SDK isn’t a direct drop-in the HPC SDK, we do make a few modifications to the components.

This is for packaging applications into containers to run them in the cloud for example?

I’m not an expert on creating containers, but I believe this guide can help with creating you’re own containers for use in Cloud instances.

Ah. I would have expected an Ubuntu repo to be specific enough to limit things to one driver, as that fixes OS version and architecture.

The question at hand for us is: If I wanted to create a docker image for a continuous integration and delivery platform, can I support both HPC SDK development and traditional CUDA with the same image, namely with the HPC SDK docker that you liked earlier? If you say you do some modifications, should I use two for better compatibility? Can I check what the differences are anywhere?

Having not done this myself, I can’t be sure, but would assume it’s fine. Folks are able to install both on a system, so can’t thing of any reason why installation within a container would be any different.

If you say you do some modifications, should I use two for better compatibility? Can I check what the differences are anywhere?

The biggest difference is that we move the CUDA math libraries into a common directory and add a version level in the directory tree. This can cause some build systems (CMake in particular) which assume a particular directory structure, may fail to find some libraries.

Not sure this would effect your project or not.