Compile error for OpenMP code with target offloading in nvhpc 20.11

Hi,
I recently installed NVHPC 20.11. When I try to compile an OpenMP code with target offloading I get the following error:
nvc-Error-OpenMP GPU Offload is available only on systems with NVIDIA GPUs with compute capability '>= cc70'

The system has NVIDIA V100, and when I run deviceQuery it shows that the compute capability is 70.
What am I missing here?

Thank You.
Alok

Hi Alok,

By default the compiler will target the device found on the system you compiled on. Are you compiling on the V100 system? If not, try adding “-gpu=cc70” to your compiler flags.

If so, then I’m wondering if the compiler can’t find the device. What’s the output from the “nvaccelinfo” utility?

What flags are you using to compile? Are you using a different “-gpu=ccXX” or “-ta=tesla:ccXX” option?

-Mat

Hi Mat,

I am compiling on the V100 system.
When I run nvaccelinfo, it says:

$ nvaccelinfo

NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.51.05 Sun Jun 28 10:33:40 UTC 2020
No accelerators found.
Try nvaccelinfo -v for more information

I believe there was a conflict between my default cuda10.2 and nvhpc’s cuda. So I unloaded the default cuda.
If I unload the cuda module and then run I get proper nvaccelinfo output, since (I think) now it is picking nvhpc’s cuda.

$ module unload cuda102

$ nvaccelinfo

CUDA Driver Version: 11000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.51.05 Sun Jun 28 10:33:40 UTC 2020

Device Number: 0
Device Name: Tesla V100-PCIE-16GB
Device Revision Number: 7.0
Global Memory Size: 16945512448
Number of Multiprocessors: 80
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1380 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 877 MHz
Memory Bus Width: 4096 bits
L2 Cache Size: 6291456 bytes
Max Threads Per SMP: 2048
Async Engines: 7
Unified Addressing: Yes
Managed Memory: Yes
Concurrent Managed Memory: Yes
Preemption Supported: Yes
Cooperative Launch: Yes
Multi-Device: Yes
Default Target: cc70

Now when I try to compile using nvc I get the error:

$ nvc -mp=gpu matrix.c
nvc-Error-OpenMP GPU Offload is available only on systems with NVIDIA GPUs with compute capability ‘>= cc70’

$  nvc -mp=gpu -gpu=cc70 matrix.c
nvc-Error-CUDA version 10.2 is not available in this installation.
$

Thank You.
Alok

Odd, this means that the compiler is still seeing the driver as being CUDA 10.2 and since you most likely downloaded the NV HPC SDK will only CUDA 11 (there’s a second larger package that also includes CUDA 10.2), it’s giving this error.

Let’s try forcing the use of CUDA 11 by adding “-gpu=cc70,cuda11.0”. The binary will fail to run on the device if the CUDA 10.2 driver is being used, but should get it to compile.