Hi! I have this code:
#pragma acc host_data use_device(image, Cov)
{
cublasDgemm(handle_gemm,CUBLAS_OP_T, CUBLAS_OP_N, bands, bands, N, &alpha, image, N, image, N, &beta, Cov, bands);
}
cublasGetStream(handle_gemm, &stream);
cudaStreamSynchronize(stream);
#pragma acc parallel loop
for(j=0; j<bands; j++)
#pragma acc loop
for(i=0; i<bands; i++)
Corr[(i*bands)+j] = Cov[(i*bands)+j]+(meanSpect[i] * meanSpect[j]);
My question is about the “#pragma acc host_data use_device(…)” clause. I believe I have to use it in the first call to cublasDgemm right (as somehow the function will be called in the host so I need to tell the compiler that those parameters are the ones in the device’s memory)?
And do I have to also use it on the underneath parallel loops? Or is everything I put inside a “kernels” or “parallel” pragma going to use the variables in the device’s memory (if present).
Note: Everything is contained in one common “acc data” region.
Thanks a lot!!