Hi,
I’m having trouble doing a certain type of reduction in a nested parallel loop. In the inner loop, there shall in the real code be a reduction to find the index of the minimal element of an array. Since I found out, that this needs 2 reductions (one to find the minimal value and one to find the/one index based on that value), I tried it without nesting, which works perfectly. But once I created another parallel loop around the 2 reductions and my actual worker code, this does not work anymore.
Minimal example:
#include <stdio.h>
int main() {
int arr[100];
int minvalues[10];
int minindexes[10];
for(int i=0; i<100; ++i) arr[i] = (i+1) % 22 + 12;
#pragma acc data copyin(arr[0:100]), copyout(minvalues[0:10]), copyout(minindexes[0:10])
{
#pragma acc parallel loop
for(int j=0; j<10; ++j) {
int minidx = 100;
int minval = 100;
// usually here is the real worker code which changes the array
// not important for the example to not work
#pragma acc loop reduction(min:minval)
for(int i=0; i<10; ++i) {
if(arr[j*10 + i] < minval) {
minval = arr[i];
}
}
minvalues[j] = minval;
#pragma acc loop reduction(min:minidx)
for(int i=0; i<10; ++i) {
if(arr[j*10 + i] == minval) {
minidx = i;
}
}
minindexes[j] = minidx;
}
} //end data
puts("Minindexes:");
for(int i=0; i<10; ++i) {
printf("%i, ", minindexes[i]);
}
puts("\n");
puts("Minvalues:");
for(int i=0; i<10; ++i) {
printf("%i, ", minvalues[i]);
}
puts("\n");
return 0;
}
For an example array like this one (index:value) …
0: 13, 1: 14, 2: 15, 3: 16, 4: 17, 5: 18, 6: 19, 7: 20, 8: 21, 9: 22,
10: 23, 11: 24, 12: 25, 13: 26, 14: 27, 15: 28, 16: 29, 17: 30, 18: 31,
19: 32, 20: 33, 21: 12, 22: 13, 23: 14, 24: 15, 25: 16, 26: 17, 27: 18,
28: 19, 29: 20, 30: 21, 31: 22, 32: 23, 33: 24, 34: 25, 35: 26, 36: 27,
37: 28, 38: 29, 39: 30, 40: 31, 41: 32, 42: 33, 43: 12, 44: 13, 45: 14,
46: 15, 47: 16, 48: 17, 49: 18, 50: 19, 51: 20, 52: 21, 53: 22, 54: 23,
55: 24, 56: 25, 57: 26, 58: 27, 59: 28, 60: 29, 61: 30, 62: 31, 63: 32,
64: 33, 65: 12, 66: 13, 67: 14, 68: 15, 69: 16, 70: 17, 71: 18, 72: 19,
73: 20, 74: 21, 75: 22, 76: 23, 77: 24, 78: 25, 79: 26, 80: 27, 81: 28,
82: 29, 83: 30, 84: 31, 85: 32, 86: 33, 87: 12, 88: 13, 89: 14, 90: 15,
91: 16, 92: 17, 93: 18, 94: 19, 95: 20, 96: 21, 97: 22, 98: 23, 99: 24,
… there is the following output:
Minindexes:
0, 100, 2, 100, 4, 100, 6, 100, 8, 100,
Minvalues:
13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
It seems as if the value-reduction is only executed in the first outer loop iteration and not in all, because it finds a 13 and not a 12 (… which is not present in the first 10 numbers). Based on that value, the second reduction works properly and generates correct indexes (or no index (=100) in areas where there is no 13).
This seems to me a bit like an error. Or am I doing something wrong?
Thanks,
Marius