scalars, parallel construct and kernel construct

I have a general question about how scalars are treated by default in a parallel construct or a kernel construct.

The OpenACC standard 1.0 says a scalar will be treated as private if it doesn’t appear in a data clause for the construct or any enclosing data construct if it is not live-in or live-out.

According to the definition of PRIVATE in a parallel construct, a copy of the scalar will be created for each gang. Which means the copy of the scalar for a particular gang will be shared by the threads in that gang, right?

There is no definition of PRIVATE in a kernel construct, but there is a definition for loop construct, which says the scalar will be created for each iteration of associated loop or loops. Since kernel construct always works with loops to be useful, I consider this is an explanation of how PRIVATE is defined in kernel construct. Right?

My last question is if I have a parallel construct combined with a loop construct, what will happen to the scalar that doesn’t appear in a data clause for the construct or any enclosing data construct if it is not live-in or live-out?

Thanks,

Ping

Hi Ping,

According to the definition of PRIVATE in a parallel construct, a copy of the scalar will be created for each gang. Which means the copy of the scalar for a particular gang will be shared by the threads in that gang, right?

Correct, when a “private” clause is used with a “parallel” directive, then the scalar is private to a gang but shared amongst vectors within the gang.

There is no definition of PRIVATE in a kernel construct, but there is a definition for loop construct, which says the scalar will be created for each iteration of associated loop or loops. Since kernel construct always works with loops to be useful, I consider this is an explanation of how PRIVATE is defined in kernel construct. Right?

The “loop” directive can be used for both “parallel” and “kernel” so it works the same for both constructs.

My last question is if I have a parallel construct combined with a loop construct, what will happen to the scalar that doesn’t appear in a data clause for the construct or any enclosing data construct if it is not live-in or live-out?

It becomes a local variable within the generated kernel or an argument to the kernel. Either way, it essentially becomes private to each vector (i.e. thread). It also increases the likelihood the variable can be placed in a register file for faster access.

Hope this helps,
Mat