The PGI Accelerator Model’s “mirror” clause simply mirrors the allocation status of a host and device array. When the host array is allocated, a device array is also allocated in the device’s global memory. However, no data movement is performed and hence the use of the “update” clause is necessary when synchronizing the host and device copies.
do you think whether data access speed is slow when the cmputing region is accessing the momories of that array?
This depends on how the data is being accessed. If the data is being accessed as a contiguous block across all threads in a warp, then it’s fine. If the access data is scattered, then it leads to memory divergence and slower performance.
Use of the hardware’s local memory can help, but is not as necessary given that the last two generations of NVIDIA hardware has automatic caching support.