Is it all?
Yes, that’s it!
What happens when you enable this option, the compiler will replace all of your malloc/new/allocate calls with the “managed” version. We use a manged pool allocator so the code is not calling cudaMallocManaged directly, but it will be managed.
and should I strip the copyin/copy/copyout calls from my “#pragma acc” statements?
No need to do this. The compiler runtime will check if the variable is managed or not. If it is managed, then the data clause is essentially ignored.
the nvprofiler still shows Host->Device and Device->Host copies of data with identical timings (of both copies and the kernel).
Without specifics, it’s difficult to say exactly why this is happening. However, keep in mind that managed memory is only currently available for use with dynamic data. So if you’re code is using fixed size arrays or objects, then these object still need to be manually managed.
Also, you need to make sure that you link with “-ta=tesla:managed” as well. Otherwise the runtime check to see if it’s managed isn’t used.
For manged memory, the profiler should have a row which shows the relative “heat” of the page migration between the host and device. It wont show the individual data copies like it does with the data regions.
Or could it be that nvprofile somehow disables the unified memory
It’s enabled by default, but it is possible to disable it when you create a session. Plus you need a device that’s capable of supporting unified memory.