private OpenACC clause on loop, kernels, and parallel constr

Youngsung · December 4, 2012, 6:04pm

Hi,

After finding out that private clause in loop construct caused performance penalty, I had a question regarding PGI’s private clause interpretation.

According to OpenACC standard v1.0, private clause is allowed on parallel construct and loop construct, but not on kernels construct. And if private clause is on loop construct, the variables in private clause are supposed to be created at every iteration. Here is my question regarding “kernels” and “private” usage: If I want to declare explicitly a list of variable as private within a gang, but do not want to create per every iteration with kernels construct, what is the correct way to use those constructs and clause?

Thanks,

Youngsung

MatColgrove · December 4, 2012, 6:51pm

Hi Youngsung,

Do you mean that you want to create a variable that is private to a gang but shared amongst the vectors in a gang?

!$acc kernels
!$acc loop gang private(A)
do i=1, N
!$acc loop vector
  do j=1,M
 ...

Here A is private to each iteration of the “i” loop, but shared amongst the iterations of the “j” loop (i.e.the vectors).

Mat

Youngsung · December 4, 2012, 7:24pm

Hi Mat,

Thanks for your kind explanations. It is good to know to put private clause on loop gang construct for vectors to share variables.

However, my situation is a bit more complicated. Please see my code below:

1 !$acc kernels
2 !$acc loop gang(ngangs) vector(neblk)
3 do ie=1,nelem
4 !$acc loop vector(npts) private(s1,s2,i,j,k,l)
5 do ii=1,npts
6 … computation using private variables and others

On line #4, I put private and it caused performance penalty.
On line #2, I have gang as well as vector. When I move private clause from line #4 to line #2, I saw approx. 10% performance improvement but had different computation result from previous one.

Actually, when I completely deleted private clause from source code, I was able to get the same result as well as 2X speed-ups. So, I am still confusing how PGI handles the private clause.

Thanks,

Youngsung

mkcolg:

Hi Youngsung,

Do you mean that you want to create a variable that is private to a gang but shared amongst the vectors in a gang?
!$acc kernels
!$acc loop gang private(A)
do i=1, N
!$acc loop vector
  do j=1,M
 ...
Here A is private to each iteration of the “i” loop, but shared amongst the iterations of the “j” loop (i.e.the vectors).

Mat

MatColgrove · December 4, 2012, 7:42pm

private(s1,s2,i,j,k,l)

These all look like scalars? By default scalars are made local to the generated kernel. This makes them private and has the added benefit that these variables are more likely to be put into a register.

When you add a scalar to a private clause, you are creating an array of these scalars in global memory, where each loop iteration has it’s own element (gang or vector). Since the variable is now in global memory, your code slows down.

I’ve talked with our compiler engineers about this and they agree that we need to rework this implementation. Essentially we should ignore scalars in a private clause when they are placed on a vector only loop and instead always make them local to the kernel. For a private on a gang loop, we should be using shared memory instead of global.

We’ll probably make this change once the proposed OpenACC 2.0 “default(none)” clause is implemented. Until then, the recommendation is not put scalars in private clauses unless absolutely necessary.

Hope this helps,
Mat

Youngsung · December 4, 2012, 7:53pm

I’ve got clear idea now how it works!!! Thanks a lot, Mat.

dcwarren · February 1, 2013, 2:58pm

If I try this with my code, I get complaints about live-out induction variables. Making the induction variable private to the loop stops the compiler from moaning, but apparently incurs a performance penalty. How can I inform the compiler that the variable “i” in a loop really doesn’t need to be remembered for the next loop over “i”?

MatColgrove · February 1, 2013, 5:13pm

You need to look how the induction variables are being declared and used. The clear case is when they are used on the right hand side after the compute region. Less clear cases are if they are declared global, used as arguments to a sub-routine, or have some other static storage. Sometimes branches can also cause this.

The quick method is to set the variables to some value immediately after the loop where they are used or use different induction variables. (i.e. “i=1”). But otherwise, this is a cause where the “private” clause may be necessary.

Mat

Topic		Replies	Views
Device memory control Legacy PGI Compilers	2	2362	February 18, 2016
about gang and worker 's privation Legacy PGI Compilers	1	2051	November 16, 2012
#pragma acc kernels loop Versus #pragma acc parallel loop Legacy PGI Compilers	3	10673	June 1, 2015
scalars, parallel construct and kernel construct Legacy PGI Compilers	1	1689	February 21, 2013
Implicit behaviour of variables inside compute construct Legacy PGI Compilers	4	549	July 22, 2020
default(none) directive behaviour as per OpenMP? Legacy PGI Compilers	3	3019	April 30, 2019
Wrong results when using the private directive with PGI 12.6 Legacy PGI Compilers	9	6427	August 22, 2012
paralle + independent and kernels + vector_length() Legacy PGI Compilers	5	4038	August 20, 2012
Reduction results in wrong results. Bug? Legacy PGI Compilers	8	7635	January 24, 2014
Construct and clauses in a deeply nested loop Legacy PGI Compilers	5	570	September 24, 2020

private OpenACC clause on loop, kernels, and parallel constr

Related topics