memecpys and other collected profile data are invalid

Hello, mat, I am now pgprof my c++ program, at the end, I encountered such a prompt,
The start and end timestamps on 310 kernels,memecpys and other collected profile data are invalid .Those profiling records have been dropped and will not be displayed inthe timeline.
And when I use the kernel memory function, I will get, insufficient kernel memory data:
the data needed to perform memory bandwidth analysis for the kernel could not be collected.
The same is true when using kernel performance. Nothing can be gained.

Pgc++ 18.10-0 64-bit target on x86-64 Linux -tp haswell
PGI Compilers and Tools
Copyright © 2018, NVIDIA CORPORATION. All rights reserved

| NVIDIA-SMI 410.48 Driver Version: 410.48

Cuda 10.0 x86 linux

There is also a small problem, how to use shared memory in openacc? Is there a link or example for reference? For example, I have several data that are often used repeatedly. I want to improve the efficiency of the program by storing it in shared memory.

Hi wanghr323,

I’m not sure about the profiler issue, but will ask our profiler folks once they get in the office. Though, it could be a mismatch in the CUDA version between what was used to build and the version used to run given 18.10 will default to use CUDA 9.2 or what the environment variable CUDA_HOME is set to. Try adding the flag “-ta=tesla:cuda10.0” when you compile.

There is also a small problem, how to use shared memory in openacc?

The best way is to add “private” arrays to a gang loop. If the private gang array has a known size at compile, the compiler will attempt to put the array in shared memory. If successful, you’ll see a -Minfo messages.

There’s also the OpenACC “cache” directive, but this directive has been particularly difficult to implement. While we do support it, it can be tricky to use so it’s better to use private gang arrays if possible.