Hi,
I just installed NV 21.3 and I am having problems compiling my code.
The problem is with our OpenACC code MAS.
I get the following compiler error:
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION. Error: /tmp/pgaccWO5JqMwcWYdR.gpu (101048, 19): parse stored value and pointer type do not match ptxas /tmp/pgaccWO5JqAMUWoZV.ptx, line 1; fatal : Missing .version directive at start of file '/tmp/pgaccWO5JqAMUWoZV.ptx' ptxas fatal : Ptx assembly aborted due to errors NVFORTRAN-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (mas.f: 64484) NVFORTRAN/x86-64 Linux 21.3-0: compilation aborted make[1]: *** [Makefile:49: mas.o] Error 2 make: *** [Makefile:32: all] Error 1
When I look into the code at the specified line number, that line of code has nothing to do with any accelerated region.
Any ideas?
Hi Ron,
The only issues that I’m aware of is due to our new atomic support 21.3 that’s exposed a few problems. No idea if this is related, but if you are using atomics, you can try adding the internal compiler flag “-Mx,231,0x01” to revert to the previous atomics.
If that doesn’t help, I’ll need to arrange a way to get a reproducing example so we can investigate.
-Mat
Hi,
That did it! Thanks!!
BTW, I am still getting the following warnings. I forget if you had said they were fixed yet or that they will be in the next release?
invalid tag !3001 = !DIBasicType(tag: DW_TAG_string_type, name: "character", size: 128, align: 8, encoding: DW_ATE_signed) invalid tag !4506 = !DIBasicType(tag: DW_TAG_string_type, name: "character", size: 256, align: 8, encoding: DW_ATE_signed) invalid tag !6734 = !DIBasicType(tag: DW_TAG_string_type, name: "character", size: 512, align: 8, encoding: DW_ATE_signed)
Thanks for testing Ron. We should talk offline about getting MAS into our QA test harness. It seems to find a few compiler issues which would be good to get those squashed before a release. Though I certainly understand if you can’t send us the code, but thought I put it out there.
No, not fixed yet since we were only able to diagnose the root cause a few weeks ago. This is actually a CUDA 11.2 issue so we need to wait until these are fixed in CUDA and then that version is integrated into the HPC SDK. Unclear when that will be but hopefully not too long.
-Mat
Hi,
I am trying to compile the MAS code with NV 21.5.
I have taken the “-Mx,231,0x01” flags away to test if the issue has been fixed.
When I compile on one machine for cc75 it compiles and runs without those flags.
However, on a different machine where I am compiling with cc60, I get:
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION. Error: /tmp/pgaccHD6bJFmPPEv0.gpu (101027, 19): parse stored value and pointer type do not match ptxas /tmp/pgaccXD6bt7Z55XlI.ptx, line 1; fatal : Missing .version directive at start of file '/tmp/pgaccXD6bt7Z55XlI.ptx' ptxas fatal : Ptx assembly aborted due to errors NVFORTRAN-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (mas.f: 64470) NVFORTRAN/x86-64 Linux 21.5-0: compilation aborted
My two compile flags are:
-O3 -march=native -acc=gpu =gpu=cc75,cuda11.3 -Minfo=accel
and
-O3 -march=native -acc=gpu =gpu=cc60,cuda11.3 -Minfo=accel
Why would the “cc” matter?
Or is there some other package I need to update?
(The OS version on both systems is the same - Mint 20.1)
BTW - If I put the Mx flag back in, the cc60 does compile as before.
Thanks!
Hi Ron,
I talked with the developer that did this work and he see no reason why the device architecture would matter. We also tested the failing cases we had in house and didn’t see any issue when targeting Pascal.
Obviously, there’s something else going on, but without a reproducer we can only hope we find the issue internally.
-Mat
OK thanks!
Maybe I have some left-over paths set to the old compiler or something like that. I will look into it.
Hi,
Just FYI, after I removed all the old versions of NV I had laying around, rebooting, and fixing a typo (I had “=gpu=” instead of “-gpu=” so the compiler was doing the default GPU acceleration, not cc60), it now works fine!
Thanks!
– Ron