OpenACC no parallelisation with ta=multicore

Hi,

I am new to OpenACC and I am trying to get the multicore parallelisation running which currently does not give any runtime improvement of the program compared to a working OpenMP implementation of the same program.

The loop I want to parallelize looks like this:

#pragma acc parallel loop
for (int y = 0; y < yDim; y++)
{
    for (int x = 0; x < xDim; x++)
    {
        int color = determineColor(x, y, points, numPoints, max_res);
        imageBuffer[x][y] = color;
    }
}

I am using the pgcc compiler with the following settings:

pgcc -Minfo -ta=multicore

I know that the inner loop probably wont be parallelized but I was expecting at least some improvement from the outer loop but so far no improvement compared to a sequential run of the program or a parallel run with OpenMP was measurable

1 Like

Hi Blob,

Can you please provide a minimal reproducing example? Unfortunately there’ not enough information here for us to help.

-Mat

voronoi.c (3.2 KB) main.c (3.2 KB)

hey these two files should get the program running and give a general idea how it looks. Basically it makes Voronoi-Diagrams. These two files are a bit older version but the problem remains the same

Might just be a binding issue. When I

  • fix the header file,
  • un-comment out the OpenMP pragma,
  • compile twice building an OpenMP and OpenACC binaries
  • set OMP_NUM_THREADS=20 and ACC_NUM_CORES=20, so only a single socket is used
  • use “taskset -c 0-19 a.out” to bind,

then the run each binary, the times are roughly the same and consistent between runs. At least when I run, going cross socket (i.e 40 cores) leads to severe run t run variation, presumably due to memory being allocated on a single NUMA node. Unfortunately, numactl isn’t installed on the the system I’m using, otherwise I’d try interleaving the memory to see if that helped the run-to-run variance.

-Mat

thanks for the answer, I am working on a windows machine so I cant use taskset and I know this might sound really dumb but where do I set the ACC_NUM_CORES in the code? I tried using acc_set_num_cores(int) in the first line of my main function but it changed nothing in the runtime. If you could give me a code example of how it should look like that would help me a lot.

1 Like

Apologies if I wasn’t clear. ACC_NUM_CORES and OMP_NUM_THREADS are environment variables so would be set in your Windows command shell via “set ACC_NUM_CORES=20”, or if you’re using the bash shell via “export ACC_NUM_CORES=20”.