cache directive not resulting in shared memory use

jmaxg3 · December 18, 2014, 8:02pm

I’m experimenting with OpenACC’s cache clause using PGI 14.10. I’ve got a simple example based on the one in the slides at [1]:

#include <stdlib.h>

int main(int argc, char **argv) {
int N = 1024;
int *restrict x = (int *)malloc(sizeof(int) * N);
int *restrict y = (int *)malloc(sizeof(int) * N);

#pragma acc parallel loop copy(x[0:N], y[0:N])
for (int i = 1; i < N - 1; i++) {
#pragma acc cache(x[i-1:2])
y = (x + x[i + 1]) / 2.0;
}
return 0;
}

Compiling with: pgc++ -acc -Minfo -ta=nvidia,cc3x foo.c

When I run this under nvprof with --metrics shared_load_transactions,shared_store_transactions it reports no loads or stores. When I look at the generated CUDA, it also shows that no shared variables are being used. Can anyone explain why the cache directive would appear to be having no effect in this example?

Minfo output is below.

[1] http://www.pgroup.com/lit/presentations/cea-3.pdf\

main:
6, Generating copy(x[:N])
Generating copy(y[:N])
Accelerator kernel generated
9, #pragma acc loop gang, vector(256) /* blockIdx.x threadIdx.x */
6, Generating Tesla code

MatColgrove · December 18, 2014, 9:45pm

Hi Max,

The cache directive was essentially disabled in the 14.x compilers. We were having performance issues so decided to rework it. The PGI 2015 compilers will contain the new and improved version of cache.

I tested your example and can verify that shared memory is being use when compiled with our pre-release 2015 compiler.

Best Regards,
Mat

jmaxg3 · December 18, 2014, 11:11pm

Thanks for the clarification Mat!

cache directive not resulting in __shared__ memory use

cache directive not resulting in shared memory use