Request: symlink to the latest version of CUDA installed with the SDK

Hi Mat,

Long shot, but could we get a symlink under comm_libs/ named latest or trampoline pointing to the latest version of CUDA installed with the SDK?
The provided nvhpc-hpcx module sets set cudaver trampoline, even though it then never uses cudaver… it looks like a leftover from someone who had a similar idea. :)

Cheers,
-Nuno

Hi Nuno,

Can I get more details on what you’re looking for?

The directories under “comms_libs” is the MPI libraries not the CUDA installs. They need to be built for each CUDA version with the top level hpcx and openmpi4 directories containing trampolines to the specific version based on either the system the code is being built or via the “-gpu=cudaXX.y” flag. So adding a “latest” link here might not get you want you want.

The CUDA installs are under the “cuda” directory. Is that where you want the “lastest” link?

-Mat

Hi Mat,

Yes, of course, I realise I didn’t quite explain why I want the symlink. The use case is as follows: I load the module for the SDK and that puts loads of useful stuff in the right paths but, unfortunately, not the headers/libraries for MPI. For that you have to navigate to comm_libs/<your_cuda_version>/hpcx/latest/modulefiles and then load, say, hpcx. If I could write latest instead of a specific CUDA version, that’d make life slightly easier, e.g. when automating this process in a script (currently I’m manually creating the link so I can do exactly this).

I understand the folders under comm_libs aren’t the actual CUDA installs, but you have one for each of the installed CUDA versions for the reasons you rightly explained. :)
I do agree that if we were to put a symlink under comm_libs then a similar one under cuda would make most sense!

EDIT: I can see the nvhpc-hpcx module doesn’t do anything with cudaver but nvhpc-hpcx-cudaXX does (though the CUDA version is hardcoded). I guess then I could rephrase my request: can we make nvhpc-hpcx do the same as nvhpc-hpcx-cudaYY for YY the latest installed CUDA version?

-Nuno

Let me ask Chris, who manages the modules and the MPI trampoline, to see what he thinks.

My only concern is that since the MPI trampoline auto-detects the CUDA driver version, if you have the environment to point to the latest, but the system has an older CUDA driver, if this would cause issues.

Also, the MPI driver should auto-set the correct include paths. So assuming you’re using the MPI driver, I’m not 100% this would be too helpful. Though so long as it doesn’t cause other issues, it shouldn’t hurt.

1 Like

I talked with a few folks here, and they don’t want to change “nvhpc-hpcx”. They’re concerned that it would cause issues for folks with older drivers.

The “nvhpc-hpcx-cudaYY” module gets updated to point to the latest minor version so having a “nvhpc-hpcx-latest” would just be a duplicate to “nvhpc-hpcx-cuda12”. The major version doesn’t happen often, so you’d only need to update your scripts every few years or so. In other words, it doesn’t seem to helpful.

Thanks Mat.

Okay, what about the symlink? That’d be innocuous because no infrastructure currently uses it and only people who know of its existence would use it anyway?

You already provide nccl and nvshmem links under comm_libs pointing to the versions compiled against the latest CUDA, so what I’m essentially asking for is a hpcx link of the same kind. This would, of course, imply moving things around because we already have a hpcx folder under comm_libs. Or, actually not, what if inside comm_libs/hpcx we create a latest link for comm_libs/12.2/hpcx/latest?