NV 20.11 compilation fails with default flags (need to specify cuda version)

Hi,

Normally when I compile my OpenACC code, I use:

-acc=gpu -gpu=cc##,cuda##.# -Minfo=accel

Today, I tried compiling in a more "default " way as:

-acc=gpu

When I try this, I get the following error:

nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgacchGi2bva1W-aVx.gpu (51, 23): parse expected comma after load’s type
ptxas /tmp/pgacc3Gi2bLAfQVpGN.ptx, line 1; fatal : Missing .version directive at start of file ‘/tmp/pgacc3Gi2bLAfQVpGN.ptx’
ptxas fatal : Ptx assembly aborted due to errors
NVFORTRAN-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (pot3d.f: 8401)
NVFORTRAN/x86-64 Linux 20.11-0: compilation aborted
make: *** [Makefile:40: pot3d.o] Error 2

I also tried using “-gpu=cc##” and it still fails.

It seems the compile requires the CUDA version to be specified.

I had thought this was an optional flag - is it now required?

  • Ron

If I remember correctly, I had to manually fix some of the broken CUDA symlinks in the /opt/nvidia/hpc_sdk/Linux_<platform>/<version>/cuda directory. This is true if you didn’t install the multi-CUDA version. Can somebody from NVIDIA verify this?

The compiler will default to use the CUDA version of the installed CUDA driver. The cuda sub-option is only needed if you want to use a different CUDA version then the default.

What CUDA driver do you have installed and what “cuda” option are you using when it compiles successfully?

The error is a code generation issue so wouldn’t expect it to matter which CUDA version you’re using, but possibly. Though as you know, we run POT3D in our daily performance testing and we’ve not seen any issues nor we don’t use -gpu=cudaX.Y. Is this a different version then what we have?

Wyphan, can you give details about what you mean by having to fix broken CUDA symlinks? Is this something you reported?

Is this something you reported?

Not yet, should I start a new thread for this?

Yes, please since I believe it unrelated to Ron’s issue.

1 Like

Hi,

My system has the CUDA driver 11.2 installed (the most recent one that the “cuda” package in Ubuntu 20.04 installs).

I had thought the compiler would default to the most recent CUDA included in the NV compiler package, but it does make sense to try to sync it with the driver version.

However, since the CUDA libraries NV is packaged with often (or always) are “behind” the most recent CUDA driver release, maybe there could be a catch for this issue so that if the driver version is not included in the NV compiler, it just uses the most recent one it has?

  • Ron

Correct, it would use the latest CUDA version installed if the CUDA driver is newer. Though, I’m still unclear why this would cause this error.

Hi,

I have the multi-CUDA version of the SDK installed if that helps?

I just re-tested after a reboot and I can confirm that this works:

-O3 -acc=gpu -gpu=cc60,cuda11.1 -Minfo=accel

and this causes the error:

-O3 -acc=gpu -Minfo=accel

This also works:

-O3 -acc=gpu -gpu=cuda11.1 -Minfo=accel

  • Ron

Hi Ron,

I’ve tried my best to replicate this on a system with a 11.2 CUDA driver using a fresh install of 20.11, but no luck. POT3D successfully compiles for me. So unfortunately, I’m not sure what’s wrong. Is this the same version of POT3D that I have?

-Mat

Hi,

It is basically the same version (just in our old fixed format).

I am compiling on a laptop with a GTX 1060 with Optimus after loading the gpu (although I am not sure why this would make a difference).

When I compile with the CUDA version specified,everything works fine, so it’s not a big deal.
If I find another system where this happens, I will let you know.

  • Ron