24.3 install through apt (209 error when running openacc code)

I wanted to move up in version of the hpc suite.
I downloaded 24.3 through apt for Ubuntu and it unpacked.

I updated my .bashrc to the new version numbers (formatting here may be messed up through copy paste)

export NVARCH=“Linux_x86_64”
export NVCOMPILERS=/opt/nvidia/hpc_sdk
export MANPATH=“$MANPATH:$NVCOMPILERS/$NVARCH/24.3/compilers/man:$NVCOMPILERS/$NVARCH/24.3/comm_libs/mpi/man”
export PATH=“$NVCOMPILERS/$NVARCH/24.3/compilers/bin:$NVCOMPILERS/$NVARCH/24.3/comm_libs/mpi/bin:$PATH”

I also reset some links under /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda

This still did not work and after some searching forums I added:

export NVHPC_CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda

The compilers now work fine. (None of these instructions are mentioned when installing through apt, or am I mistaken?)

But: running gives: Accelerator Fatal Error: call to cuLinkComplete returned error 209 (CUDA_ERROR_NO_BINARY_FOR_GPU): No binary for GPU

Apparently something’s still not set correctly. Any ideas?
I guess I could go to the tar installation following the more basic instructions, but was hoping the apt would get me there with less hassle :-)


Hi Danny,

What was the issue you were seeing? The PATH setting all look correct.

export NVHPC_CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda

You shouldn’t need to set this since it’s used when you want the nvhpc compilers to use a CUDA SDK not shipped with the NVHPC SDK. This shouldn’t be the case here.


I believe this happens when host binary was created, a target device binary for the device on this system was not embedded. It might be because of the NVHPC_CUDA_HOME setting not including the CUDA version at the end, but could be due to the original issue.

I’d recommend unsetting NVHPC_CUDA_HOME and then lets work through the original issue.

It may also be helpful if you can post the output from the “nvidia-smi” command so we can see what device and CUDA driver version you’re using.

The “nvaccelinfo” utility is also useful since it will show what the compilers and OpenACC runtime are detecting. If it can’t detect the device, but nvidia-smi can, it usually means the CUDA driver can’t be found which usually occurs if the driver is installed in a non-default location.


In case I do not use the NVHPC_CUDA_HOME I get:

nvfortran-Error-A CUDA toolkit matching the current driver version (11.6) or a supported older version (11.0 or 11.8) was not installed with this HPC SDK.

The output of nvidia-smi is (shortened version):

| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
| 0 NVIDIA A40 On | 00000000:3B:00.0 Off | 0 |
| 0% 68C P0 280W / 300W | 44705MiB / 46068MiB | 100% Default |
| | | N/A |
| 1 NVIDIA A40 On | 00000000:AF:00.0 Off | 0 |
| 0% 46C P0 78W / 300W | 44955MiB / 46068MiB | 0% Default |
| | | N/A |
| 2 NVIDIA A40 On | 00000000:D8:00.0 Off | 0 |
| 0% 71C P0 302W / 300W | 39047MiB / 46068MiB | 100% Default |
| | | N/A |

The output of nvaccelinfo for the first gpu is:

CUDA Driver Version: 11060
NVRM version: NVIDIA UNIX x86_64 Kernel Module 510.47.03 Mon Jan 24 22:58:54 UTC 2022

Device Number: 0
Device Name: NVIDIA A40
Device Revision Number: 8.6
Global Memory Size: 47641198592
Number of Multiprocessors: 84
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1740 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 7251 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 6291456 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Managed Memory: Yes
Concurrent Managed Memory: Yes
Preemption Supported: Yes
Cooperative Launch: Yes
Default Target: cc86

Thanks for your help,

Hi Danny,

You should only get that message if you’re explicitly setting the CUDA version to 11.6 (i.e. the “-gpu=cuda11.6” flag), or you downloaded the NVHPC SDK that only includes CUDA 12.3.

I’m guessing you downloaded the SDK with only 12.3.

On the download page: NVIDIA HPC SDK Current Release Downloads | NVIDIA Developer

There are two different packages. “Bundled with the newest CUDA version (12.3)” and “Bundled with the newest plus two previous CUDA versions (12.3, 11.8)”. You want the second one since your driver is old.

The second option would be to update your CUDA driver to 12.3 or newer. Official Drivers | NVIDIA


Hi Mat,

You are absolutely right. I did deliberately choose the ‘only 12.3’ option. I guess I misunderstood the implications of what that means.

Just a question to help me decide on the best options: If I download the 12.3 driver: will it affect other applications using the GUPs, such as deep learning software (PyTorch etc.)? Or is this completely unrelated? I am going to get into trouble with my PhD students if I screw that up :-)

Thanks Mat !

I wouldn’t think it would be a problem given newer CUDA drivers can run binaries build with older CUDA versions, but I have zero experience with running DL frameworks so unfortunately don’t know for sure if it would cause issues.

Thanks. I will try the multiple cuda download option and see from there.
I will let you know what the result is.


Works !

Hopefully other posts will contain more interesting stuff :-)

1 Like