Issues when compiling VASP on a POWER9 cpu

I am trying to build VASP (https://www.vasp.at/) on an IBM power9 processor with v100 GPUs. I can compile many of the VASP FORTRAN subroutines but when I try to compile xspin.f90 I get error messages from C++ header files. It looks like pgfortran is using the system gcc compiler (version 4.8.5) rather than the gcc compiler I have loaded in my environment (version 8.4.0) when I installed the HPC toolkit and when I build VASP. The file I have uploaded with this post shows the error messages I get. Is there a workaround for this issue?

error.log (52.5 KB)

During installation, we detect the system default gcc versions (along with some other related things) and put them into a file named localrc in the bin directory. So, that’s what our compilers use by default. You can follow the same process for other gnu versions. In our compiler bin directory is a script named makelocalrc. You can run that without arguments to see the options. Give it a path to the gcc you want, and save that off. Then run our compilers with the -dryrun option to see all the places where we read local rcfiles. Put this newly generated localrc in the appropriate place, depending on your site, your environment, etc. Verify with -dryrun that it is read, and you should be good to go.

Hi John,

It looks to me that the error is coming from the CUDA 10.1 tools and is picking up the GNU 8.4.0 header files:

soft/packaging/spack-builds/linux-rhel7-power8le/gcc-4.8.5/gcc-8.4.0-n5lbxh6ybxdmqh346yxg3lpkpx3clfzm/lib/gcc/powerpc64le-unknown-linux-gnu/8.4.0/include/stddef.h(444): error: identifier “nullptr” is undefined

/soft/packaging/spack-builds/linux-rhel7-power8le/gcc-4.8.5/gcc-8.4.0-n5lbxh6ybxdmqh346yxg3lpkpx3clfzm/lib/gcc/powerpc64le-unknown-linux-gnu/8.4.0/include/stddef.h(444): error: expected a “;”

/soft/packaging/spack-builds/linux-rhel7-power8le/gcc-4.8.5/gcc-8.4.0-n5lbxh6ybxdmqh346yxg3lpkpx3clfzm/include/c++/8.4.0/powerpc64le-unknown-linux-gnu/bits/c++config.h(242): error: expected a “;”

CUDA 10.1 doesn’t support GNU 8.x releases so it may be a simple incompatibility problem. The “nullptr” error is most likely since C++11 isn’t defined by CUDA 10.1 but on by default in GNU 8.4 (nullptr is new in C++11)

Are you able to use an earlier version of GNU, like 7.x, which CUDA 10.1 supports?
Or can you try using CUDA 11 (-gpu=cuda11.0)?

Best Regards,
Mat

MatColgrove and bleback.

I have updated localrc manually to use GCC 8.4.5 and changed to CUDA 11.0 which greatly reduced the number of error messages but…

See attached file for details.

I still am getting three error messages related to dcmplx2 conversion.

Any recommendations on how to proceed?final_error.txt (35.5 KB)

Hi John,

This looks like a device code generation issue. I’ve sent it off to some of the folks here that work on VASP to see if 1) this is known issue, and 2) if they can provide help on working around the error.

I’ll update you once we know more.

Thanks,
Mat

Our VASP folks got back to me and noticed that you’re trying to use the nollvm (i.e. CUDA C) back-end which has been deprecated. Try removing the flag “-ta=tesla:nollvm” and just use “-ta=tesla”.

Secondly, you’ve included device code generation for multiple targets, i,e, “-ta=tesla:cc30,cc35,cc60,cc70,cuda11.0”. While I don’t know if its the case here, sometimes older devices didn’t support some features so could be the cause. If just removing “nollvm” doesn’t work, try removing “cc30” and possibly “cc35” and “cc60”.

-Mat