Using the cache directive

mmikalsen · June 9, 2013, 4:00pm

Hi

I have a SOR Poisson solver kernel that i want to try to use the cache directive on. Im am however a little bit unclear how and where to use it. Was wonder if anyone could give me some suggestions.

I am using the deviceptr clause because data is allocated using acc_malloc, and the mask value represents obstacles in the volume.

Also wondering what would be the optimal gang vector configuration when working on a 3D grid such as 128x32x128 (x,y,z). As default the compiler partitions the two inner loops (x,z) across gangs an the inner loop (x) across vectors.

Thanks for any suggestions.

#define I(X,Y,Z) ((X) + (Z)*dim.x + (Y)*dim.z*dim.x) 

void solve_poisson(float *p, float *p0, float *b, int *obs, float *poisson_tab, dim_3 dim, float w) {
#pragma acc kernels deviceptr(p,p0,obs,b,poisson_tab) copyin(w)   
{     
    #pragma acc loop independent 
    for(int y = 1; y < dim.y-1; ++y) { 
        #pragma acc loop independent
        for(int z = 1; z < dim.z-1; ++z) { 
            #pragma acc loop independent 
            for(int x = 1; x < dim.x-1; ++x) {

                int mask = obs[I(x,y,z)] & 127;

                if ((~mask & 126) && (mask & VOX_SELF) == 0) {
                    float res = 0.0f;
                    res += p0[I(x-1,y,z)] * (float)((mask & VOX_LEFT) == 0);
                    res += p0[I(x+1,y,z)] * (float)((mask & VOX_RIGHT) == 0);
                    res += p0[I(x,y-1,z)] * (float)((mask & VOX_BELOW) == 0);
                    res += p0[I(x,y+1,z)] * (float)((mask & VOX_ABOVE) == 0);
                    res += p0[I(x,y,z-1)] * (float)((mask & VOX_UP) == 0);
                    res += p0[I(x,y,z+1)] * (float)((mask & VOX_DOWN) == 0);
                    res -= b[I(x,y,z)];
                    res *= poisson_tab[mask>>1];
                    res *= w;
                    res += p0[I(x,y,z)]*(1.0f - w);
                    p[I(x,y,z)] = res;
                }
                else {
                    p[I(x,y,z)] = 0.0f;
                }
            }
        }
    }
}
}

MatColgrove · June 10, 2013, 4:34pm

Hi mmikalsen,

For the “cache” directive, the syntax would be something like:

            #pragma acc loop independent 
            for(int x = 1; x < dim.x-1; ++x) { 
            #pragma acc cache(p0[x:128])

Unfortunately, it can be tricky to use and the compiler doesn’t always accept it depending upon the array and it’s layout. We’re working on it and should have better support in the future. Though, the compiler typically does do a good job utilizing shared memory so often the cache directive isn’t needed.

More often than not, the compiler does find the optimal schedule and in this case it may be. However, I would have thought the default would be gang, gang vector, vector. Though, you can over ride this and experiment.

    #pragma acc loop gang independent 
    for(int y = 1; y < dim.y-1; ++y) { 
        #pragma acc loop gang vector independent 
        for(int z = 1; z < dim.z-1; ++z) { 
            #pragma acc loop vector independent 
            for(int x = 1; x < dim.x-1; ++x) {

Hope this helps,
Mat

Topic		Replies	Views
question about cache() directive Legacy PGI Compilers	4	3380	July 13, 2016
Questions on cache directive capability Legacy PGI Compilers	3	3599	June 17, 2014
cache directive not resulting in __shared__ memory use Legacy PGI Compilers	2	3972	December 18, 2014
Cache directive with derived type problem Legacy PGI Compilers	9	6957	June 20, 2019
OpenACC cache directive Legacy PGI Compilers	1	4619	October 15, 2012
Using Shared Memory Legacy PGI Compilers	4	6052	June 13, 2012
acc cache directive Legacy PGI Compilers	1	3282	September 3, 2014
paralle + independent and kernels + vector_length() Legacy PGI Compilers	5	4047	August 20, 2012
OpenACC cache directive issues Legacy PGI Compilers	2	2722	February 13, 2013
Unexpected crash when using `!$acc cache` directive nvc, nvc++ and nvfortran	2	462	August 3, 2022

Using the cache directive

Related topics