# Problem with min-index reduction in nested loop

Hi,

I’m having trouble doing a certain type of reduction in a nested parallel loop. In the inner loop, there shall in the real code be a reduction to find the index of the minimal element of an array. Since I found out, that this needs 2 reductions (one to find the minimal value and one to find the/one index based on that value), I tried it without nesting, which works perfectly. But once I created another parallel loop around the 2 reductions and my actual worker code, this does not work anymore.

Minimal example:

``````#include <stdio.h>

int main() {
int arr[100];
int minvalues[10];
int minindexes[10];

for(int i=0; i<100; ++i) arr[i] = (i+1) % 22 + 12;

#pragma acc data copyin(arr[0:100]), copyout(minvalues[0:10]), copyout(minindexes[0:10])
{
#pragma acc parallel loop
for(int j=0; j<10; ++j) {
int minidx = 100;
int minval = 100;

// usually here is the real worker code which changes the array
// not important for the example to not work

#pragma acc loop reduction(min:minval)
for(int i=0; i<10; ++i) {
if(arr[j*10 + i] < minval) {
minval = arr[i];
}
}
minvalues[j] = minval;

#pragma acc loop reduction(min:minidx)
for(int i=0; i<10; ++i) {
if(arr[j*10 + i] == minval) {
minidx = i;
}
}
minindexes[j] = minidx;
}
} //end data

puts("Minindexes:");
for(int i=0; i<10; ++i) {
printf("%i, ", minindexes[i]);
}
puts("\n");

puts("Minvalues:");
for(int i=0; i<10; ++i) {
printf("%i, ", minvalues[i]);
}
puts("\n");

return 0;
}
``````

For an example array like this one (index:value) …

``````0: 13, 1: 14, 2: 15, 3: 16, 4: 17, 5: 18, 6: 19, 7: 20, 8: 21, 9: 22,
10: 23, 11: 24, 12: 25, 13: 26, 14: 27, 15: 28, 16: 29, 17: 30, 18: 31,
19: 32, 20: 33, 21: 12, 22: 13, 23: 14, 24: 15, 25: 16, 26: 17, 27: 18,
28: 19, 29: 20, 30: 21, 31: 22, 32: 23, 33: 24, 34: 25, 35: 26, 36: 27,
37: 28, 38: 29, 39: 30, 40: 31, 41: 32, 42: 33, 43: 12, 44: 13, 45: 14,
46: 15, 47: 16, 48: 17, 49: 18, 50: 19, 51: 20, 52: 21, 53: 22, 54: 23,
55: 24, 56: 25, 57: 26, 58: 27, 59: 28, 60: 29, 61: 30, 62: 31, 63: 32,
64: 33, 65: 12, 66: 13, 67: 14, 68: 15, 69: 16, 70: 17, 71: 18, 72: 19,
73: 20, 74: 21, 75: 22, 76: 23, 77: 24, 78: 25, 79: 26, 80: 27, 81: 28,
82: 29, 83: 30, 84: 31, 85: 32, 86: 33, 87: 12, 88: 13, 89: 14, 90: 15,
91: 16, 92: 17, 93: 18, 94: 19, 95: 20, 96: 21, 97: 22, 98: 23, 99: 24,
``````

… there is the following output:

``````Minindexes:
0, 100, 2, 100, 4, 100, 6, 100, 8, 100,

Minvalues:
13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
``````

It seems as if the value-reduction is only executed in the first outer loop iteration and not in all, because it finds a 13 and not a 12 (… which is not present in the first 10 numbers). Based on that value, the second reduction works properly and generates correct indexes (or no index (=100) in areas where there is no 13).

This seems to me a bit like an error. Or am I doing something wrong?

Thanks,
Marius

Hi Marius,

Shouldn’t the program be using “j*10+i” instead of “i” for the index to the found values?

• Mat
``````% diff -u min_org.c min.c
--- min_org.c   2014-10-23 14:34:34.085307435 -0700
+++ min.c       2014-10-23 14:34:21.588311572 -0700
@@ -20,7 +20,7 @@
#pragma acc loop reduction(min:minval)
for(int i=0; i<10; ++i) {
if(arr[j*10 + i] < minval) {
-                                         minval = arr[i];
+                                         minval = arr[j*10+i];
}
}
minvalues[j] = minval;
@@ -28,7 +28,7 @@
#pragma acc loop reduction(min:minidx)
for(int i=0; i<10; ++i) {
if(arr[j*10 + i] == minval) {
-                                         minidx = i;
+                                         minidx = j*10 + i;
}
}
minindexes[j] = minidx;
% pgcc -acc -Minfo=accel min.c
main:
10, Generating copyin(arr[:])
Generating copyout(minvalues[:])
Generating copyout(minindexes[:])
12, Accelerator kernel generated
13, #pragma acc loop gang /* blockIdx.x */
21, #pragma acc loop vector(256) /* threadIdx.x */
Min reduction generated for minval
29, #pragma acc loop vector(256) /* threadIdx.x */
Min reduction generated for minidx
12, Generating Tesla code
21, Loop is parallelizable
29, Loop is parallelizable
% a.out
Minindexes:
0, 10, 21, 30, 43, 50, 65, 70, 87, 90,

Minvalues:
13, 23, 12, 21, 12, 19, 12, 17, 12, 15,
``````