ICE with OpenMP map clause

Hi,

I’m encountering an ICE with the latest HPC SDK for a simple Saxpy example:

int main (int argc, char *argv[]) {
  size_t N = 1024*1024;
  double k = 1.2345;

  double *a = (double*)malloc(N*sizeof(double));
  double *b = (double*)malloc(N*sizeof(double));
  double *c = (double*)malloc(N*sizeof(double));

  for (size_t i = 0; i < N; i++) {
     a[i] = 0.5 * (double)i;
     b[i] = 0.75 * (double)i;
  }

#pragma omp target map (to: a[0:N], b[0:N])
#pragma omp target teams distribute parallel for
  for (size_t i = 0; i < N; i++) {
     c[i] = k * a[i] + b[i];
  }
}

(I am not entirely sure if that combination of pragmas is valid. Maybe it’s not and that’s why it hasn’t been found so fast?)

Compiling with -mp=gpu yields

NVC++-F-0000-Internal compiler error. child tinfo should have been created at outlining function for host    1327  (test2.c: 16)
NVC++/x86-64 Linux 24.5-1: compilation aborted

The program compiles fine when I add enter data to the map directive.

The compiler version that I use is 24.5-1, but I have observed the ICE with 24.3 and 23.5, too.

Regards,
Christian

Hi Christian,

The problem here is with nested target compute regions. I assume you meant to have the outer target be a data region, i.e. “#pragma omp target data map (to: a[0:N], b[0:N])”.

I can’t find anything in the standard that states if compute regions can be nested which would make the behavior undefined. Granted, I could be missing it, but even if it were allowed, this is saying to launch a compute kernel from within another compute kernel (aka dynamic parallelism).

OpenACC specifically allows dynamic parallelism, but we chose not to support it given other than a few toy examples, we’ve not been able to find any real use cases for it.

Was it your intent to use dynamic parallelism?

Now the compiler shouldn’t ICE, so I’ve added a problem report, TPR#35881, and have asked engineering to detect this error.

Besides adding the “data” clause to the outer directive, you can instead remove the inner “target” to get it to work:

#pragma omp target map (to: a[0:N], b[0:N])
#pragma omp teams distribute parallel for
  for (size_t i = 0; i < N; i++) {

-Mat

Hi Mat,

thank you for your elaborate answer and for reporting the issue. There’s no specific intent behind this code. I was just playing around with different combinations of pragmas when I encountered that ICE.

I agree with your understanding that the map clause alone is linked to a specific parallel region. The target data and target enter/exit data are directives on their own and therefore it works with them.

Regards,
Christian