implicit copy of scalar variables

Hi,

I have been chasing wrong results compared with the cpu version of my code and it turned out to be an issue with a scalar variable that I assumed would be copied to the device whenever entering the kernel (which was generated with the openacc kernels construct).
I have a for loop decorated with a kernels and loop independent statement within a member function. Within the for loop I need to access a scalar (int) variable, which is a data member of the same class.
The data member gets initialized in the class’ constructor, i.e. set to 0, and is later changed to another value in some setup member function. This setup member function gets called before the first launch of the kernel(the kernel gets launched many times).
To my surprise, the value of the scalar data member is always 0 in the kernel and not the one that got set in the setup member function.
Is this the expected behavior according to the standard or the PGI implementation?
Is there a difference between scalar data members and scalar local variables when it comes to the behavior of implicit/automatic copying in? I have read about the handling of global variables using the declare statement and understand that those might get copied to the device upon acc initialization and that an update directive is needed if a later assigned value is needed.

I am a bit concerned about this automatic copy behavior because it seems to work as expected most of the times, but this is the second time I am tracing wrong results due to an issue with the automatic copying of scalar variables . It seems that I haven’t fully understood what is happening behind the scenes.

Could you please shed some light on this issue.

Thanks,
LS
PS1: pcopyin(this[0:1]) is used at several locations that are passed before the kernel launch and also before the setup member function. So maybe the scalar member gets copied to the device at the first instance of such a pcopyin statement?
PS2: I have also searched in the forum but haven’t found a recent post explaining it in detail. There seemed to be some confusion about the private clause, but no mention of local scalars vs. scalar members.

Is there a difference between scalar data members and scalar local variables when it comes to the behavior of implicit/automatic copying in?

Yes. A scalar data member is part of an aggregate type so would have the same semantics as it’s type.

In this case, when you update the class member you need to update the device copy as well via an “update” directive.

PS1: pcopyin(this[0:1]) is used at several locations that are passed before the kernel launch and also before the setup member function. So maybe the scalar member gets copied to the device at the first instance of such a pcopyin statement?

If not already present, copy, copyin or update will perform a shallow copy of the data members. So if the data members are set, on the host side, they will get copied to the device as well. Note that if you have any pointer data members, the host pointers get copied over so be order matters.

Does this help clarify things?

  • Mat

Hi Mat,

Thanks for explanation. So do I understand correctly that local scalar variables are always copied in, i.e. updated, automatically at every kernel launch as opposed to scalar data members, which get copied over to the device at the first encounter of a pcopyin() as a result of the shallow copy of the data. So if any of those scalar member variables are changed later on the host and I want those to be reflected on the device, I will have to manage the updating manually using update directives.

Please confirm whether my understanding is correct.

Thanks,
LS

So do I understand correctly that local scalar variables are always copied in, i.e. updated, automatically

It’s not that they are copied in, its that they are firstprivate by default, meaning that each thread will have it own private variable which gets initialized to the value of the original variable.

There are exceptions to this:

If a scalar is passed by reference to a subroutine within a compute region, the compiler must assume that the variable may become aliased and privatizing could lead to incorrect results. Although unlikely, the compiler must assume the worst case. Here, the compiler will make the scalar shared across all threads and in some cases this may prevent parallelization of the loop. If this occurs, either pass the variable by value instead, or add the variable to a “private” clause.

If the scalar has global storage, then again the compile must assume the variable may be aliased and the variable will be shared.

So if any of those scalar member variables are changed later on the host and I want those to be reflected on the device, I will have to manage the updating manually using update directives.

Yes, you must manage the data synchronization of aggregate types yourself. Eventually when deep copy/update is supported in the OpenACC standard, you be able to use class variable directly in data clauses and update directives. But for now, you’ll need to have your classes manage their own data.

Hope this helps,
Mat

Thanks, got it!
LS