Hello,
Recently, i have a program using OpenACC to do offload work.
In my codes, I create an OpenACC device function to run in the parallel loop.
#pragma parallel loop deviceptr(A, B, C)
{
func(A, B, C);
}
The data are already copied to device and by using acc_deviceptr() runtime function, three device pointers A, B, C are obtained.
For some reason, i would like to use OpenMP doing multithreading work, for each one thread, an OpenACC kernel is distributed:
#pragma omp parallel num_threads()
{
size_t tid = omp_get_thread_num();
#pragma omp loop for
for (int i = 0; i < n; ++i) {
#pragma acc parallel loop deviceptr(A, B, C)
{
func(A, B, C);
}
}
}
My question is, does PGI V17 support this kind of work? If not, what the other solution? Such as follow:
#pragma acc parallel loop private(i) // run on CPU
for(int i = 0; i<n; ++i)
{
#pragma acc loop deviceptr(A, B, C) // run on GPU
{
func(A, B, C);
}
}
All codes are writen by using OpenACC directives, but the outside parallel region is targeting for CPU multithreading, the inside loop is targeting for GPU kernel.
Sincerely,
Tao[/b]