Just released: HPC SDK 24.7

jmudd · July 30, 2024, 11:20pm

HPC SDK v24.7 delivers support for Ubuntu 24.04, new Fortran interfaces for CUDA Graphs, and a major version NVSHMEM API update. It is the last release to support RHEL 7.

Please refer to the Release Notes for full details.

Download the current release at https://developer.nvidia.com/nvidia-hpc-sdk-downloads.

View the current documentation.

mbp65 · July 31, 2024, 9:26am

After updating to this release my code can not properly choose #nonzero device in multi-GPU configuration.
For example, following code always outputs “device: 0”:

integer :: istat, num
istat=cudaSetDevice(1)
istat=cudaGetDevice(num)
print *, "device:", num
end

But release 24.5 works fine, allowing to choose second GPU.
I use cloud image with 2x4090, nvidia-smi properly outputs both GPUs, so does pgaaccelinfo.
Although I am okay with using the release 24.5, curious about the reasons behind this problem.

scamp1 · July 31, 2024, 2:09pm

Hi mbp65 - thanks for bringing this to our attention! I have brought it to our engineers attention and after a preliminary look, we believe this is an unexpected reversion in behavior. As our engineering team digs more into the issue, I’ll let you know if we decide it’s expected for some reason, or - more likely, we are going to work on a fix.

mbp65 · July 31, 2024, 6:14pm

Hi scamp1, thanks for your reply, please let me know if you need any details or assistance in reproducing this problem.

scamp1 · July 31, 2024, 6:38pm

Sounds good. We seem to have identified the issue as being linked to compiling with “-cuda”. If you compile instead with “-cudalib”, you should get back the earlier expected behavior.

If that doesn’t resolve your issue, let me know because it could be a use case we don’t know about yet.

Cheers,

Seth.

scamp1 · August 1, 2024, 6:53pm

Also, the engineer working on the issue suggested you could also add “-acc” to the compilation line and achieve previous behavior. I haven’t tested this explicitly, so if you have issues with it - let me know and I’ll investigate.

mbp65 · August 6, 2024, 7:33am

Thanks for dealing with this request, after checking on my cloud GPU account I can confirm that -cudalib or -acc switches allow to select the required GPU.

scamp1 · August 6, 2024, 12:54pm

Perfect! We hope to have a fix for this released in an upcoming NVHPC release so that things return back to their normal expected behavior. When that happens, I’ll update you again. Thanks again for letting us know about this!

diarr_rokiatou · August 21, 2024, 10:57am

Hi,
In the Docker container image available at nvcr.io/nvidia/nvhpc:24.7-devel-cuda_multi-ubuntu22.04 , the “ompi_info” command is not available in /opt/nvidia/hpc_sdk/Linux_x86_64/24.7/comm_libs/hpcx/bin. How can I install or add “ompi_info” corresponding to /opt/nvidia/hpc_sdk/Linux_x86_64/24.7/comm_libs/hpcx/bin/mpirun ?
Thank you.

MatColgrove · August 21, 2024, 2:48pm

Because HPC-X needs to be built with particular CUDA versions, the top-level “comm_libs/hpcx” is just a set a scripts to point to the particular HPC-X.

For ompi_info, you need to follow the CUDA version path and then look in the “ompi/bin” directory. For example:

/opt/nvidia/hpc_sdk/Linux_x86_64/24.7/comm_libs/12.5/hpcx/hpcx-2.19/ompi/bin/ompi_info

diarr_rokiatou · August 23, 2024, 7:09am

Thank for your respond. I can now use /opt/nvidia/hpc_sdk/Linux_x86_64/24.7/comm_libs/12.5/hpcx/hpcx-2.19/ompi/bin/ompi_info.

Topic		Replies	Views
Just released: HPC SDK 24.7 Legacy PGI Compilers	1	41	July 30, 2024
Just released: HPC SDK v24.7 Announcements	0	64	July 30, 2024
Device Enumeration and cudaSetDevice SDK Examples Failing to Run on Device 0, but run fine on Device CUDA Programming and Performance	5	30673	August 25, 2011
Fedora 14 + cudatoolkit 3.2 + dirver 260.19.26 GPU Computing SDK code samples CUDA Programming and Performance	4	1155	January 27, 2011
Can the program compiled with NVCC run a machine without the GPU card? CUDA Programming and Performance	15	10971	June 15, 2011
unsupported device type Legacy PGI Compilers	6	9020	July 30, 2010
cuda+mpi CUDA Programming and Performance	3	2405	May 9, 2010
cuda+mpi CUDA Programming and Performance	1	1945	May 4, 2010
Install Problem CUDA Programming and Performance	32	12803	December 17, 2009
There is no device supporting CUDA CUDA Programming and Performance	5	3676	October 12, 2010

Just released: HPC SDK 24.7

Related topics