To my surprise, the following while loop executes succesfully on the accelerator (serially, i.e. by a single worker):
#include <iostream>
int main(int argc, char const *argv[])
{
int *ARRAY = new int [10];
int i, sum;
for (i = 0; i < 10; ++i)
{
ARRAY[i] = i+1;
}
#pragma acc enter data copyin(ARRAY[0:10]) create(sum,i)
#pragma acc parallel present(ARRAY[0:10],sum) num_gangs(1) num_workers(1)
{
sum = 0;
i = 0;
while (sum < 46){
sum += ARRAY[i];
++i;
}
}
#pragma acc exit data delete(ARRAY[0:10]) copyout(sum)
std::cout << sum << std::endl;
return 0;
}
and returns 1+2+…+10 = 55:
$ pgcpp main.cpp
$ ./a.out
55
However, if I try to add nested parallelism - which is legal according to the OpenACC 2.0 standards:
#include <iostream>
int main(int argc, char const *argv[])
{
int i,sum;
#pragma acc enter data create(i,sum)
#pragma acc parallel present(i,sum) num_gangs(1) num_workers(1)
{
sum = 0;
i = 0;
while (i < 3){
#pragma acc parallel loop reduction(+:sum)
for (int j = 0; j < 10; ++j)
{
sum += (j+1);
}
++i;
}
}
#pragma acc exit data delete(i) copyout(sum)
std::cout << sum << std::endl;
return 0;
}
Compilation fails:
$ pgcpp -acc main.cpp
PGCC-S-0155-Illegal context for kernels (main.cpp: 14)
PGCC/x86 Linux 14.9-0: compilation completed with severe errors
Where line 22 is the while loop statement. I am aware that the parallel for is executed three times redundantly on the device, this example is just to illustrate the problem I have with another, more interesting piece of code. Is nested parallelism not supported in PGI 14.9 compiler, or am I writing illegal code?