Device memory control

Q1.
How do I define shared variables that should stay in global memory? I don’t want them to be automatically copied to shared memory.

Q2.
Does OpenACC support worker-level privatization of scalars arrays?
That is, can I make a scalar or an array shared among vectors in a single worker?
I saw examples of privatization for gang and vector, but not for worker.

If so, where does the variable stay? Does it go to global memory or shared memory?

In case of scalar, I saw from past topics that scalar variable are allocated in register for each vector by default. However, I have many scalar variables that are shared in a worker or gang and I don’t want them to use up the register.

Q3.
Is there any possibilities of slow-down if cache directive is called frequently?

Q1. How do I define shared variables that should stay in global memory? I don’t want them to be automatically copied to shared memory.

If the variable is in a data clause, then it’s shared and in global memory (unless it’s read-only then the compiler may put into textured memory).

Arrays are shared by default, scalars are private. So to make a scalar shared, add it to a data clause. If you are assigning to shared scalars, be sure to use atomic operations to avoid race conditions.

Q2. Does OpenACC support worker-level privatization of scalars arrays?
That is, can I make a scalar or an array shared among vectors in a single worker? I saw examples of privatization for gang and vector, but not for worker.

Yes. The “private” clause applies to the schedule on the loop it’s applied. So a private on the gang level is private to each gang, private on a worker loop is private to each worker, and private on a vector loop is private to each vector.

If so, where does the variable stay? Does it go to global memory or shared memory?

By default it will go into global memory. In some cases, mainly when the array size is known and small enough to fit, shared memory may be used.

Q3. Is there any possibilities of slow-down if cache directive is called frequently?

Yes, since setting up the shared memory requires a syncthreads be called. Many syncthreads can slow down your code.

  • Mat

Does “add it to a data clause” mean !$ACC DATA PRESENT(scalar)?