Hi,
I can specify a “gang vector” loop schedule for both loop parts within a nested loop while using the kernels construct:
#pragma acc kernels
#pragma acc loop gang vector
for( int j = 0; j < n; j++)
{
#pragma acc loop gang vector
for( int i = 0; i < m; i++ ) {...}
}
Then the compiler uses a 2 dimensional grid and 2 dimensional blocks (that is exactly what I want):
67, #pragma acc loop gang, vector(2) /* blockIdx.y threadIdx.y */
70, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
HOWEVER, if I use the parallel construct instead of kernels, I get an error message and the inner loop schedule will be ignored:
PGC-S-0155-Nested loops cannot have the same worksharing type (file.c: 67)
[..]
67, #pragma acc loop gang, vector(256) /* blockIdx.x threadIdx.x */
Why do I get this error when it apparently workd nicely (and as expected) with the kernels construct?
How can I get 2 dimensional grids and 2 dimensional blocks with the parallel construct?
Bye, Sandra