Just released: HPC SDK 24.7

HPC SDK v24.7 delivers support for Ubuntu 24.04, new Fortran interfaces for CUDA Graphs, and a major version NVSHMEM API update. It is the last release to support RHEL 7.

Please refer to the Release Notes for full details.

Download the current release at https://developer.nvidia.com/nvidia-hpc-sdk-downloads.

View the current documentation.

1 Like

After updating to this release my code can not properly choose #nonzero device in multi-GPU configuration.
For example, following code always outputs “device: 0”:

integer :: istat, num
istat=cudaSetDevice(1)
istat=cudaGetDevice(num)
print *, "device:", num
end

But release 24.5 works fine, allowing to choose second GPU.
I use cloud image with 2x4090, nvidia-smi properly outputs both GPUs, so does pgaaccelinfo.
Although I am okay with using the release 24.5, curious about the reasons behind this problem.

Hi mbp65 - thanks for bringing this to our attention! I have brought it to our engineers attention and after a preliminary look, we believe this is an unexpected reversion in behavior. As our engineering team digs more into the issue, I’ll let you know if we decide it’s expected for some reason, or - more likely, we are going to work on a fix.

Hi scamp1, thanks for your reply, please let me know if you need any details or assistance in reproducing this problem.

Sounds good. We seem to have identified the issue as being linked to compiling with “-cuda”. If you compile instead with “-cudalib”, you should get back the earlier expected behavior.

If that doesn’t resolve your issue, let me know because it could be a use case we don’t know about yet.

Cheers,

Seth.

1 Like

Also, the engineer working on the issue suggested you could also add “-acc” to the compilation line and achieve previous behavior. I haven’t tested this explicitly, so if you have issues with it - let me know and I’ll investigate.

Thanks for dealing with this request, after checking on my cloud GPU account I can confirm that -cudalib or -acc switches allow to select the required GPU.

Perfect! We hope to have a fix for this released in an upcoming NVHPC release so that things return back to their normal expected behavior. When that happens, I’ll update you again. Thanks again for letting us know about this!

1 Like

Hi,
In the Docker container image available at nvcr.io/nvidia/nvhpc:24.7-devel-cuda_multi-ubuntu22.04 , the “ompi_info” command is not available in /opt/nvidia/hpc_sdk/Linux_x86_64/24.7/comm_libs/hpcx/bin. How can I install or add “ompi_info” corresponding to /opt/nvidia/hpc_sdk/Linux_x86_64/24.7/comm_libs/hpcx/bin/mpirun ?
Thank you.

Because HPC-X needs to be built with particular CUDA versions, the top-level “comm_libs/hpcx” is just a set a scripts to point to the particular HPC-X.

For ompi_info, you need to follow the CUDA version path and then look in the “ompi/bin” directory. For example:

/opt/nvidia/hpc_sdk/Linux_x86_64/24.7/comm_libs/12.5/hpcx/hpcx-2.19/ompi/bin/ompi_info

Thank for your respond. I can now use /opt/nvidia/hpc_sdk/Linux_x86_64/24.7/comm_libs/12.5/hpcx/hpcx-2.19/ompi/bin/ompi_info.