update directive - how to?

Hi Mat,

I’m having some problems with the update directive within a data region.

Here is a little sample code:

   int n = 1024;
   FloatType* out = (double*) malloc(sizeof(double) *n);
   FloatType* in = (double*) malloc(sizeof(double) *n);

   int i;
   for(i = 0 ; i< 1024; ++i)
      in[i] = 1.0;
   
#pragma acc data create(in[0:n],out[0:n])
   {   
#pragma update device(in[0:n]) // copy data from host to device
#pragma acc kernels present(in[0:n],out[0:n])
      for(i = 0 ; i< 1024; ++i)
         out[i] = in[i] + 1.0;
#pragma update host(out[0:n]) // copy data from device to host
   }   
   printf("%f %f \n",out[0],out[1]);

Of course this code does not make any sense but it illustrates my use case.

PGI 12.9 generates the following feedback:

     31, Generating create(out[0:n])
         Generating create(in[0:n])
     34, Generating present(out[0:n])
         Generating present(in[0:n])
         Generating compute capability 2.0 binary
     35, Loop is parallelizable
         Accelerator kernel generated
         35, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
             CC 2.0 : 12 registers; 0 shared, 52 constant, 0 local memory bytes

I am expecting an output of “2.0 2.0” but I receive “0.0 0.0”.

Thanks.

Best,
Paul

Hi Paul,

You forgot the “acc” before “update”.

   int n = 1024;
   FloatType* out = (double*) malloc(sizeof(double) *n);
   FloatType* in = (double*) malloc(sizeof(double) *n);

   int i;
   for(i = 0 ; i< 1024; ++i)
      in[i] = 1.0;
   
#pragma acc data create(in[0:n],out[0:n])
   {   
#pragma acc update device(in[0:n]) // copy data from host to device
#pragma acc kernels present(in[0:n],out[0:n])
      for(i = 0 ; i< 1024; ++i)
         out[i] = in[i] + 1.0;
#pragma acc update host(out[0:n]) // copy data from device to host
   }   
   printf("%f %f \n",out[0],out[1]);

Hope this helps,
Mat

indeed I missed the “acc”, thank you.

Is it possible that the compiler issues a warning that the pragma is not recognized?

However, for my real application it is still not working since the compiler complains about this:

         Accelerator clause: upper bound for dimension 1 of array 'array_in' is unknown
         Accelerator clause: upper bound for dimension 0 of array 'array_in' is unknown
         Generating update device(array_in[0:3][0:num_entries])

         Accelerator clause: upper bound for dimension 1 of array 'array_out' is unknown
         Accelerator clause: upper bound for dimension 0 of array 'array_out' is unknown
         Generating update host(array_out[0:3][0:num_entries])

and it results in the following run-time error:

call to cuMemcpyDtoH returned error 1: Invalid value

My code looks something like the following:

 void some_function(my_struct_t* my_struct){
    float** array_in = my_struct->array_in;
    float** array_out = my_struct->array_out;
    #pragma acc update device(array_in[0:3][0:num_entries]
    //launch some kernel (similar to the example above)
    #pragma acc update host(array_out[0:3][0:num_entries]
}

 int main(){
    my_struct_t* my_struct;
    my_struct = my_struct_init(); //initializes and allocates the data
    float** array_in = my_struct->array_in;
    float** array_out = my_struct->array_out;

#pragma acc data create(array_out[0:3][0:num_entries], array_in[0:3][0:num_entries]) 
    some_function(my_struct);
 }

I guess that has something to do with the function call between the acc data create and the update directive but I’m not quite sure.
Is there a way to fix this?

Any help is much appriciated.

Best,
Paul

Here is yet another related problem I’m facing:

Is there an OpenACC equivalent to cudaMemcpy?
There is acc_alloc(…) which let’s me allocate data on the GPU and there is a deviceptr clause to pass this memory to the OpenACC region, but in order to solve the problem above I need a copy from device to host where I can specify the destination.

+Paul