Where to use this OpenACC pragma?

Hi! I have this code:

    #pragma acc host_data use_device(image, Cov)
    {
      cublasDgemm(handle_gemm,CUBLAS_OP_T, CUBLAS_OP_N, bands, bands, N, &alpha, image, N, image, N, &beta, Cov, bands);
    }

    cublasGetStream(handle_gemm, &stream);
    cudaStreamSynchronize(stream);
    
  #pragma acc parallel loop
  for(j=0; j<bands; j++)
      #pragma acc loop
      for(i=0; i<bands; i++)
      	Corr[(i*bands)+j] = Cov[(i*bands)+j]+(meanSpect[i] * meanSpect[j]);

My question is about the “#pragma acc host_data use_device(…)” clause. I believe I have to use it in the first call to cublasDgemm right (as somehow the function will be called in the host so I need to tell the compiler that those parameters are the ones in the device’s memory)?

And do I have to also use it on the underneath parallel loops? Or is everything I put inside a “kernels” or “parallel” pragma going to use the variables in the device’s memory (if present).

Note: Everything is contained in one common “acc data” region.

Thanks a lot!!

The “host_data” directive will use the device pointer for these variables on the host within the defined code block. It’s used primarily to pass device pointers to CUDA routines or to MPI to enable CUDA Aware MPI enabling direct device to device transfers. Here you’ll be passing the device pointers for “image” and “Cov” directly to cuBLAS.

And do I have to also use it on the underneath parallel loops? Or is everything I put inside a “kernels” or “parallel” pragma going to use the variables in the device’s memory (if present).

Again “host_data” will use the device pointer on the host. The “parallel” loops will be executed on the device so no need to use “host_data” here. Technically you could but it makes it more complicated since you then also need to add a “deviceptr(…vars…)” clause on the parallel loop so the compiler knows that the device pointer is being used on the host.

By default, the compiler will do the host to device mapping (if present) upon entry to the parallel loop and why “host_data” isn’t needed.

-Mat