"Live-out" from reduction loop

Hi,

I’m having problems getting two variables from a min-reduction out to the host - could somebody tell me how to solve the issue?

I have an array of structs (in the example just containing an int for keeping it simple) which I’m copying into the accelerator area and do a minimum reduction on the contained values. Afterwards, on the host side, I’d like to receive both the minimum value and the struct itself without having to copy the whole input array out again!

In a minimal example, this compiles and puts out “5” as expected:

#include <stdio.h>

struct mystruct {
  int val;
};

int main() {
  struct mystruct test[100];
  struct mystruct result;
  struct mystruct *res = &result;
  int minval = 100;

  //initialize test with 5,6,7,...
  for(int i=0; i<100; ++i) test[i].val = i+5;

  #pragma acc parallel loop copy(test[0:100]), copyout(res[0:1]), reduction(min:minval)
  for(int i=0; i<100; ++i) {
    if(test[i].val < minval) {
      minval = test[i].val;
      //res = test + i;
    }
  }

  printf("%i\n", minval);
  //printf("%i\n", res->val);

  return 0;
}

Compiler-Output:

main:
     15, Generating copy(test[:])
         Generating copyout(res[:1])
         Accelerator kernel generated
         16, #pragma acc loop gang, vector(256) /* blockIdx.x threadIdx.x */
         17, Min reduction generated for minval
     15, Generating NVIDIA code

But if I remove the two comments, it doesn’t even compile anymore:

PGC-S-0155-Cannot determine bounds for array res (reduc_test.c: 15)
main:
     15, Generating copy(test[:])
         Accelerator kernel generated
         16, #pragma acc loop gang /* blockIdx.x */
         Scalar last value needed after loop for 'minval' at line 23
         Scalar last value needed after loop for 'res' at line 24
         Accelerator restriction: scalar variable live-out from loop: res
         Accelerator restriction: scalar variable live-out from loop: minval
PGC/x86-64 Linux 14.3-0: compilation completed with severe errors

I understand that this should not work properly in a “normal” parallel loop because all threads would write to res at the same time, but in a reduction, there must be internally something like a tree reduction, so in the end there should be one last comparison that should set both minval and res with their final values.

Since my approach does not work, is there another (better) solution? I can work around the issue by just copying out the index (which against my assumption did work - probably because it’s a scalar?) to the host and access the element based on the original array. But since in my program I change the structs inside the kernel, I would need to copy the whole array back to the host (like in the example with copy instead of copyin) which I don’t want for performance reasons.

Thanks in advance,
Marius

Hi Marius,

There isn’t a way to do what you want. First, you can’t change what “res” points to since you’d loose the association with the host pointer. You could have “res” point to the first element of “test”, copyin test, then put “res” in a present clause to allow “res” to be modified. However, you really need “res” to be in a max reduction, and pointers can’t be used in reductions.

They only way I could think off that comes close to what you want is to add a second loop to find the minimum index then use that to grab the result. There may be a better way, but that’s what comes to mind.

Hope this helps,
Mat

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 64000

 struct mystruct {
   int val;
 };

 int main() {
   struct mystruct test[N];
   struct mystruct result;
   struct mystruct *res = &result;
   int minval = RAND_MAX;
   int minidx = N+1;
   srand(time(0));
   for(int i=0; i<N; ++i) test[i].val = rand();

   #pragma acc data copyin(test[0:N])
   {
   #pragma acc parallel loop reduction(min:minval)
   for(int i=0; i<N; ++i) {
     if(test[i].val < minval) {
         minval = test[i].val;
    }
   }
   #pragma acc parallel loop reduction(min:minidx)
   for(int i=0; i<N; ++i) {
     if(test[i].val == minval) {
         if (minidx > i) {
           minidx = i;
         }
    }
   }
   }
   res = &test[minidx];
   printf("Minval=%i\n", minval);
   printf("Minidx %i\n", minidx);
   printf("Res: %i\n", res->val);

   return 0;
 }