While loop and nested parallelism

To my surprise, the following while loop executes succesfully on the accelerator (serially, i.e. by a single worker):

#include <iostream>

int main(int argc, char const *argv[])
{
	int *ARRAY = new int [10];
	int i, sum;

	for (i = 0; i < 10; ++i)
	{
		ARRAY[i] = i+1;
	}

	#pragma acc enter data copyin(ARRAY[0:10]) create(sum,i)

	#pragma acc parallel present(ARRAY[0:10],sum) num_gangs(1) num_workers(1)
	{
		sum = 0;
		i = 0;

		while (sum < 46){
			sum += ARRAY[i];
			++i;
		}
	}


	#pragma acc exit data delete(ARRAY[0:10]) copyout(sum)

	std::cout << sum << std::endl;

	return 0;
}

and returns 1+2+…+10 = 55:

$ pgcpp main.cpp
$ ./a.out
55

However, if I try to add nested parallelism - which is legal according to the OpenACC 2.0 standards:

#include <iostream>

int main(int argc, char const *argv[])
{
	int i,sum;

	#pragma acc enter data create(i,sum)

	#pragma acc parallel present(i,sum) num_gangs(1) num_workers(1)
	{
		sum = 0;
		i = 0;

		while (i < 3){

			#pragma acc parallel loop reduction(+:sum)
			for (int j = 0; j < 10; ++j)
			{
				sum += (j+1);
			}

			++i;
		}
	}


	#pragma acc exit data delete(i) copyout(sum)

	std::cout << sum << std::endl;

	return 0;
}

Compilation fails:

$ pgcpp -acc main.cpp
PGCC-S-0155-Illegal context for kernels  (main.cpp: 14)
PGCC/x86 Linux 14.9-0: compilation completed with severe errors

Where line 22 is the while loop statement. I am aware that the parallel for is executed three times redundantly on the device, this example is just to illustrate the problem I have with another, more interesting piece of code. Is nested parallelism not supported in PGI 14.9 compiler, or am I writing illegal code?

Hi LO_UZH,

Is nested parallelism not supported in PGI 14.9 compiler, or am I writing illegal code?

Not supported yet. It one of the few OpenACC 2.0 features we have left to implement. You can see what features are supported and in which release at: PGI Compilers with OpenACC | PGI

  • Mat