Hi,
I’m having problems getting two variables from a min-reduction out to the host - could somebody tell me how to solve the issue?
I have an array of structs (in the example just containing an int for keeping it simple) which I’m copying into the accelerator area and do a minimum reduction on the contained values. Afterwards, on the host side, I’d like to receive both the minimum value and the struct itself without having to copy the whole input array out again!
In a minimal example, this compiles and puts out “5” as expected:
#include <stdio.h>
struct mystruct {
int val;
};
int main() {
struct mystruct test[100];
struct mystruct result;
struct mystruct *res = &result;
int minval = 100;
//initialize test with 5,6,7,...
for(int i=0; i<100; ++i) test[i].val = i+5;
#pragma acc parallel loop copy(test[0:100]), copyout(res[0:1]), reduction(min:minval)
for(int i=0; i<100; ++i) {
if(test[i].val < minval) {
minval = test[i].val;
//res = test + i;
}
}
printf("%i\n", minval);
//printf("%i\n", res->val);
return 0;
}
Compiler-Output:
main:
15, Generating copy(test[:])
Generating copyout(res[:1])
Accelerator kernel generated
16, #pragma acc loop gang, vector(256) /* blockIdx.x threadIdx.x */
17, Min reduction generated for minval
15, Generating NVIDIA code
But if I remove the two comments, it doesn’t even compile anymore:
PGC-S-0155-Cannot determine bounds for array res (reduc_test.c: 15)
main:
15, Generating copy(test[:])
Accelerator kernel generated
16, #pragma acc loop gang /* blockIdx.x */
Scalar last value needed after loop for 'minval' at line 23
Scalar last value needed after loop for 'res' at line 24
Accelerator restriction: scalar variable live-out from loop: res
Accelerator restriction: scalar variable live-out from loop: minval
PGC/x86-64 Linux 14.3-0: compilation completed with severe errors
I understand that this should not work properly in a “normal” parallel loop because all threads would write to res at the same time, but in a reduction, there must be internally something like a tree reduction, so in the end there should be one last comparison that should set both minval and res with their final values.
Since my approach does not work, is there another (better) solution? I can work around the issue by just copying out the index (which against my assumption did work - probably because it’s a scalar?) to the host and access the element based on the original array. But since in my program I change the structs inside the kernel, I would need to copy the whole array back to the host (like in the example with copy instead of copyin) which I don’t want for performance reasons.
Thanks in advance,
Marius