I am experimenting with basic OpenACC Fortran features. I have prepared a very simple example of computing a 2D Gaussian surface using derived types, and trying to transfer data between the host and the device. This example is modular and is distributed across multiple files on purpose (in contrast to all example codes I have seen so far). You may find the source code and the Makefile here: https://github.com/moravveji/OpenACC/tree/master/derived_types
The problem is: the code neatly compiles and runs. However, I never see the expected compiler feedback talking about data movements, and the generation of device kernel codes. Thus, the code is only running on the host, ignoring all !$acc directives. See below:
$> make clean; make rm -f *.mod *.o *~ *.exe pgfortran -c -o io.o io.f90 pgfortran -c -o vars.o vars.f90 pgfortran -c -o kern.o kern.f90 pgfortran -c -o main.o main.f90 pgfortran -acc -ta=tesla:cuda8.0,cc35 -o drv_types.exe io.o vars.o kern.o main.o -Minfo=all ./drv_types.exe
I have two guesses: (1) either I am messing something up in my Makefile or any of the Fortran modules (which I cannot spot readily), or (2) there are yet additional compiler flags to set when the OpenACC directives are used across one/multiple modules/source files. Or perhaps something else is the reason.
I use PGI-17.4 with K40c NVIDIA device on a Westmere node. The following environment variables are also set for extra feedback:
export ACC_DEVICE_TYPE=nvidia export PGI_ACC_NOTIFY=1 export PGI_ACC_TIME=1
I would be glad if some one of the accelerator black belts point out how to fix this compilation issue.