What is the difference between CUDA Toolkit and NVIDIA HPC SDK

Ayo, community and fellow developers. I have some questions. Sorry if I sound ridiculous, because I’m almost going crazy.

When I wanted to use CUDA, I was faced with two choices, CUDA Toolkit or NVHPC SDK. Here I use Ubuntu 22 x86_64 with nvidia-driver-545. In my opinion, the HPC SDK is more complete than the CUDA toolkit. They both have nvc, nvcc, and nvc++, but NVHPC has more features that I need such as cuFFT, cuTENSOR, cuSPARSE and cuBLAS. Too bad it doesn’t have cuDNN in it.

  1. Or, is there one, but I don’t know?

It looks like if I install cuda-toolkit-12-4 along with nvidia-cudnn, they will be compatible. It’s just that HPC SDK may not be the same as cuda-toolkit.

  1. Or, is it possible that cuda-toolkit also has the same thing as HPC SDK? I do not think so.
  2. I have wasted a lot of time, just trying to figure out what to choose by installing and uninstalling packages via the shell.

After installing nvhpc-24-3, I couldn’t directly test the HPC Compilers. I had to run several commands in the shell first, as explained in the documentation. It seemed like a waste of time, so I just tried adding a few lines in ~/.bashrc.

NVARCH=`uname -s`_`uname -m`
export NVARCH
NVCOMPILERS=/opt/nvidia/hpc_sdk
export NVCOMPILER
MANPATH=$MANPATH:$NVCOMPILERS/$NVARCH/24.3/compilers/man
export MANPATH
PATH=$NVCOMPILERS/$NVARCH/24.3/compilers/bin:$PATH
export PATH

export PATH=$NVCOMPILERS/$NVARCH/24.3/comm_libs/mpi/bin:$PATH
export MANPATH=$MANPATH:$NVCOMPILERS/$NVARCH/24.3/comm_libs/mpi/man

export MODULEPATH=$NVCOMPILERS/modulefiles:$MODULEPATH
module load nvhpc

Now, all HPC Compilers are detected. But, there will always be a module: command not found error every time I open a new shell. Is this module thing really important? I commented out that # module ... part, hence the error are gone for every new shell. Is that okay?

  1. What errors might occur? What do I seem to be missing?

If my case is for AI training and inference, 3D graphics rendering (OpenGL, DX, VK, Metal), cross-OS and cross-arch code compiling for further use in the .NET/Mono C# environment,

  1. Which should I choose, the NVHPC SDK or just the pure CUDA Toolkit?
  2. Or do I actually do programming using this tool, in Windows?

The primary difference is the intended audience, HPC or general use. They do both share some components, but only those that are used by both cases.

Given you’re doing more with AI and Graphics, I’d say the General CUDA SDK is more appropriate. Also, NVHPC doesn’t support Windows since Linux dominates HPC data centers. I’ve had users successfully use NVHPC on WSL2, but that’s really Linux not native Windows.

Is this module thing really important?

Modules are often used in HPC data centers as an easy method for users to set and switch environment variables. For example, if you were switching back and forth between a Clang environment and an NVHPC environment, then modules make it easy to do, just one command. Without modules, you’re having to manually change the PATH and other env variables.

Though they are not needed (I rarely use them myself), you just need to set your environment settings on the command line or shell configuration file. A little more work, but if you’re not changing it often, then not an issue.

That answer sounds really helpful. Also, Windows has been my main platform to do development with CUDA for years and trying to move to Linux just confuse me a lot. Aight bet coming back to Win.

Thanks sir.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.