How to create data on device

Hi

May I have an example for the OpenAcc-data-create-clause?

I tried the attached program. But it failes with

call to cuStreamSynchronize returned error 700: Illegal address during kernel execution

Greetings
Benedikt

      PROGRAM TEST
      IMPLICIT NONE
      INTEGER , Allocatable ::FLX_U (:)     
      INTEGER I,S
!$acc data create( FLX_U (1000)) copy (S)
      WRITE(*,*) 'Init'
!$acc kernels
      DO i=1,1000
        FLX_U(i) = i
      END DO
!$acc end kernels
!$acc end data
      END

Hi Benedikt,

The problem here is that FLX_U isn’t allocated and the device copy will have the same status as the host copy.

Are you trying to make it so FLX_U isn’t allocated on the host?

The best way to do this is add the CUDA Fortran “device” attribute to the array. When it gets allocated, only the device array will be created. Below is an example. I also added some macros so that “device” is only added when OpenACC is enabled. That way your code will still be valid when OpenACC is not used.

Hope this helps,
Mat

% cat test.F90
#ifdef _OPENACC
#define DEVICE ,device
#else
#define DEVICE
#endif
      PROGRAM TEST
       IMPLICIT NONE
       INTEGER DEVICE, Allocatable ::FLX_U (:)
       INTEGER I,S
       allocate(FLX_U(1000))
       S=0
       WRITE(*,*) 'Init'
 !$acc kernels
       DO i=1,1000
         FLX_U(i) = i
       END DO
 !$acc end kernels
 !$acc kernels loop reduction(+:S)
       DO i=1,1000
         S=S+FLX_U(i)
       END DO
 !$acc end kernels
       print *, S
       deallocate(FLX_U)
       END
% pgf90 test.F90 -acc -Mcuda -Minfo=accel; a.out
test:
     14, Loop is parallelizable
         Accelerator kernel generated
         14, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     19, Loop is parallelizable
         Accelerator kernel generated
         19, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
             Sum reduction generated for s
 Init
       500500

Yes. That was my intention: My program runs out of host-memory although most of the memory I only need on the device.

And yes: I also need to compile without OpenAcc/Cuda.

Additionally I’ll dig a little bit deeper in “C-Preprocessor for Fortran” and into CUDA-Fortran…

Your answer is helpfull, thank you!
Benedikt