[Help] Using OpenAcc with mean filter

Hi all,
Currently, i’m trying OpenAcc into my code. But it only speed up x2 times. This is my code:

void meanFilter(unsigned char *A, int w, int h, int ch, int step)
{
    unsigned char *Anew = new unsigned char[w * h * ch];

    int n = w;
    int m = h;

    memset(Anew, 0, w * h * ch * sizeof(A[0]));

    printf("Image size: %d x %d x %d\n", w, h, ch);

    StartTimer();
#pragma acc data //copy(A[:n*m]) //copyin(Anew[:n*m])
    {
#pragma acc loop collapse(2) //copy(A[:n*m]) copyin(Anew[:n*m])
        for (int j = 1; j < n - 1; j++)
        {
            for (int i = 1; i < m - 1; i ++)
            {
                Anew[j * m + i] = 0.25f * (A[j * m + i + 1] + A[j * m + i - 1] + A[(j - 1) * m + i] + A[(j + 1) * m + i]);
            }
        }

#pragma acc loop //copyin(A[0:n*m]) copy(Anew[0:n*m])
        for (int i = 0; i < n * m; i++)
        {
                A[i] = Anew[i];
        }
    }

    double runtime = GetTimer();

    printf(" total: %f ms\n", runtime);
}

What is problem with my code?

Hi DanhNam,

You’re just using a loop directive without being in a compute region, hence the loops wont be offloaded. I’m a bit surprised you saw any speed-up, since the code isn’t actually being run on the GPU.

Try something like the following. Be sure to view the compiler feedback messages (-Minfo=accel) to see what the compiler is doing.

% cat test.cpp
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

void meanFilter(unsigned char *A, int w, int h, int ch, int step)
{
    unsigned char *Anew = new unsigned char[w * h * ch];

    int n = w;
    int m = h;

    memset(Anew, 0, w * h * ch * sizeof(A[0]));

    printf("Image size: %d x %d x %d\n", w, h, ch);

//    StartTimer();
#pragma acc data copy(A[:n*m]) copyin(Anew[:n*m])
    {
#pragma acc parallel loop collapse(2)
        for (int j = 1; j < n - 1; j++)
        {
            for (int i = 1; i < m - 1; i ++)
            {
                Anew[j * m + i] = 0.25f * (A[j * m + i + 1] + A[j * m + i - 1] + A[(j - 1) * m + i] + A[(j + 1) * m + i]);
            }
        }

#pragma acc parallel loop
        for (int i = 0; i < n * m; i++)
        {
                A[i] = Anew[i];
        }
    }

//    double runtime = GetTimer();

//    printf(" total: %f ms\n", runtime);
}
% pgc++ -c test.cpp -ta=tesla -Minfo=accel
meanFilter(unsigned char *, int, int, int, int):
     18, Generating copy(A[:m*n])
         Generating copyin(Anew[:m*n])
         Generating Tesla code
         20, #pragma acc loop gang, vector(128) collapse(2) /* blockIdx.x threadIdx.x */
         22,   /* blockIdx.x threadIdx.x collapsed */
     26, Generating Tesla code
         29, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */

-Mat

Hi mkcolg,

Thank you so much!
I got it.