"atomic capture" to work for vector not for kernel

Hello,

What should I write to make “atomic capture” to work only for vector not for the whole kernel?
version 1 - "firstprivate/privat"e and “x = 0” gives an error
version 2 - “int x = 0;” gives same error
version 3 - “if” should work but I don’t know what to write

Now the results are
arr[0] = 0
arr[1] = 1
arr[2] = 2
arr[3] = 3
arr[4] = 4
arr[5] = 5

but I expect
arr[0] = 0
arr[1] = 1
arr[2] = 2
arr[3] = 0
arr[4] = 1
arr[5] = 2

thank you

//pgcc 1b.c -acc -Mconcur -Minfo=accel -Minline -Msmartalloc -Msafeptr -ta=tesla:cc50,time

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <openacc.h>

int main(void) {

    int x = 0;
    int a = 3, b = 2;
    int *restrict arr = (int *) calloc(a*b, sizeof(int));

     #pragma acc data copyout(arr[0:a*b])
     {
        #pragma acc parallel loop worker //firstprivate(x)
        for (int j = 0; j < b; j++) {

            //x = 0;
            //int x = 0;
            //if
                #pragma acc loop vector
                for (int i = 0; i < a; i++) {
                    int v;
                    #pragma acc atomic capture
                    v = x++;
                    arr[v] = v;
                }
        }
    }

    for (int i = 0; i < a * b; i++)
        printf("arr[%d] = %d\n", i, arr[i]);
    printf("\n");

    free(arr);
}

Hi tasica,

You’re encountering a limitation of the NVIDIA devices where atomic operations can only be performed on variables stored in global or shared memory. Local variables can’t be used in atomics.

This means for your code, if you make the outer loop “gang” without a “worker” or “vector”, then all variable declared at this loop level will be shared. Below is an example of the fix (I also fixed your indexing of “arr” since you’d get garbage for indices 3-5 using “v” as an index since x only has the values 0,1, and 2).

% cat tascia.c

//pgcc 1b.c -acc -Mconcur -Minfo=accel -Minline -Msmartalloc -Msafeptr -ta=tesla:cc50,time

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <openacc.h>

int main(void) {

    //int x = 0;
    int a = 3, b = 2;
    int *restrict arr = (int *) calloc(a*b, sizeof(int));

     #pragma acc data copyout(arr[0:a*b])
     {
        #pragma acc parallel loop gang
        for (int j = 0; j < b; j++) {
            int x=0;
                #pragma acc loop vector
                for (int i = 0; i < a; i++) {
                    int v;
                    #pragma acc atomic capture
                    v = x++;
                    arr[j*a+i] = v;
                }
        }
    }

    for (int i = 0; i < a * b; i++)
        printf("arr[%d] = %d\n", i, arr[i]);
    printf("\n");

    free(arr);
}


% pgcc -ta=tesla:cc60 -Minfo=accel tascia.c -V18.4
main:
     15, Generating copyout(arr[:b*a])
     17, Accelerator kernel generated
         Generating Tesla code
         18, #pragma acc loop gang /* blockIdx.x */
         21, #pragma acc loop vector(128) /* threadIdx.x */
     21, Loop is parallelizable
% a.out
arr[0] = 0
arr[1] = 1
arr[2] = 2
arr[3] = 0
arr[4] = 1
arr[5] = 2

Hope this helps,
Mat

unfortunatelly it is not working in my computer, it gives this error (23 is atomic capture)

main:
     14, Generating copyout(arr[:b*a])
     16, Accelerator kernel generated
         Generating Tesla code
         17, #pragma acc loop gang /* blockIdx.x */
         20, #pragma acc loop vector(128) /* threadIdx.x */
     20, Loop is parallelizable
nvvmCompileProgram error: 9.
Error: 1b.c(23) Error: unsupported operation

PGC-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (1b.c: 1)
PGC/x86-64 Linux 17.10-0: compilation aborted[/quote]

I thought that PGI compiler is wrong installed, I’ve done the instalation again, just Fedora and PGI compiler without CUDA toolkit.
I’ve tried also Ubuntu and PGI compiler, same results.
Maybe I am missing something on installation, maybe some enviroment variables must be turned on, maybe Community Edition is missing something, maybe my configurations are not good (Quadro M1000M and GTX 1080Ti)
I have no idea.

Hi Mat,

I just wanted to let you know … 18.4 it works smooth like butter :-)
uraaaaaa !

thank you very much

Yes, sorry about that. 17.10 had some bugs with atomic capture on shared variables that were fixed in 18.1.