OpenACC no parallelisation with ta=multicore

Blob · July 15, 2020, 12:13pm

Hi,

I am new to OpenACC and I am trying to get the multicore parallelisation running which currently does not give any runtime improvement of the program compared to a working OpenMP implementation of the same program.

The loop I want to parallelize looks like this:

#pragma acc parallel loop
for (int y = 0; y < yDim; y++)
{
    for (int x = 0; x < xDim; x++)
    {
        int color = determineColor(x, y, points, numPoints, max_res);
        imageBuffer[x][y] = color;
    }
}

I am using the pgcc compiler with the following settings:

pgcc -Minfo -ta=multicore

I know that the inner loop probably wont be parallelized but I was expecting at least some improvement from the outer loop but so far no improvement compared to a sequential run of the program or a parallel run with OpenMP was measurable

MatColgrove · July 15, 2020, 2:47pm

Hi Blob,

Can you please provide a minimal reproducing example? Unfortunately there’ not enough information here for us to help.

-Mat

Blob · July 15, 2020, 6:25pm

voronoi.c (3.2 KB) main.c (3.2 KB)

hey these two files should get the program running and give a general idea how it looks. Basically it makes Voronoi-Diagrams. These two files are a bit older version but the problem remains the same

MatColgrove · July 16, 2020, 6:15pm

Might just be a binding issue. When I

fix the header file,
un-comment out the OpenMP pragma,
compile twice building an OpenMP and OpenACC binaries
set OMP_NUM_THREADS=20 and ACC_NUM_CORES=20, so only a single socket is used
use “taskset -c 0-19 a.out” to bind,

then the run each binary, the times are roughly the same and consistent between runs. At least when I run, going cross socket (i.e 40 cores) leads to severe run t run variation, presumably due to memory being allocated on a single NUMA node. Unfortunately, numactl isn’t installed on the the system I’m using, otherwise I’d try interleaving the memory to see if that helped the run-to-run variance.

-Mat

Blob · July 18, 2020, 9:02am

thanks for the answer, I am working on a windows machine so I cant use taskset and I know this might sound really dumb but where do I set the ACC_NUM_CORES in the code? I tried using acc_set_num_cores(int) in the first line of my main function but it changed nothing in the runtime. If you could give me a code example of how it should look like that would help me a lot.

MatColgrove · July 20, 2020, 3:06pm

Apologies if I wasn’t clear. ACC_NUM_CORES and OMP_NUM_THREADS are environment variables so would be set in your Windows command shell via “set ACC_NUM_CORES=20”, or if you’re using the bash shell via “export ACC_NUM_CORES=20”.

garcfd · December 1, 2023, 4:31pm

If I am running on a single GPU, how do I set the number of threads (or parallel processes)?
is it ACC_NUM_CORES or OMP_NUM_THREADS.
Thanks Giles.

MatColgrove · December 1, 2023, 6:44pm

These only set the number of host threads to use.

The number of threads on the GPU would be the product of the number of gangs times the number of workers and vectors. So you could explicit set this via the “num_gangs”, “num_workers”, and “vector_length” clauses, but I wouldn’t recommend it. While these clauses can be useful under certain circumstances, in general it’s best to let the compiler use as many threads as possible depending on the parallel loop trip counts and the target architecture.

Topic		Replies	Views
Correct environment variables for MPI + OpenMP + OpenACC hybrid code Legacy PGI Compilers	4	1197	May 25, 2020
OpenACC for Multicore nvc, nvc++ and nvfortran	3	118	May 30, 2025
About new option -ta=multicore CUDA Programming and Performance	2	1172	November 25, 2015
Can still use OMP_NUM_THREADS without OpenMP compilation Legacy PGI Compilers	4	3140	November 12, 2019
No Multicore Core generated: why? Legacy PGI Compilers	3	1614	March 21, 2018
Setting Cores in OpenACC Legacy PGI Compilers	1	2683	December 4, 2019
Thread bindings with OpenACCx86 and MPI Legacy PGI Compilers	3	4925	November 15, 2017
combine the OpenMP with the OpenACC Legacy PGI Compilers	5	5524	April 22, 2014
Combining OpenMP and OpenACC Legacy PGI Compilers	4	6327	November 14, 2017
How to insert openacc multicore parallel loops into nvfortran GPU code nvc, nvc++ and nvfortran	6	120	April 23, 2025

OpenACC no parallelisation with ta=multicore

Related topics