Is it possible that the compiler issues a warning that the pragma is not recognized?
However, for my real application it is still not working since the compiler complains about this:
Accelerator clause: upper bound for dimension 1 of array 'array_in' is unknown
Accelerator clause: upper bound for dimension 0 of array 'array_in' is unknown
Generating update device(array_in[0:3][0:num_entries])
Accelerator clause: upper bound for dimension 1 of array 'array_out' is unknown
Accelerator clause: upper bound for dimension 0 of array 'array_out' is unknown
Generating update host(array_out[0:3][0:num_entries])
and it results in the following run-time error:
call to cuMemcpyDtoH returned error 1: Invalid value
My code looks something like the following:
void some_function(my_struct_t* my_struct){
float** array_in = my_struct->array_in;
float** array_out = my_struct->array_out;
#pragma acc update device(array_in[0:3][0:num_entries]
//launch some kernel (similar to the example above)
#pragma acc update host(array_out[0:3][0:num_entries]
}
int main(){
my_struct_t* my_struct;
my_struct = my_struct_init(); //initializes and allocates the data
float** array_in = my_struct->array_in;
float** array_out = my_struct->array_out;
#pragma acc data create(array_out[0:3][0:num_entries], array_in[0:3][0:num_entries])
some_function(my_struct);
}
I guess that has something to do with the function call between the acc data create and the update directive but I’m not quite sure.
Is there a way to fix this?
Is there an OpenACC equivalent to cudaMemcpy?
There is acc_alloc(…) which let’s me allocate data on the GPU and there is a deviceptr clause to pass this memory to the OpenACC region, but in order to solve the problem above I need a copy from device to host where I can specify the destination.