Hi iomagkanaris,
I don’t believe we claim interoperability between OpenACC and OpenMP Target to GPUs, but the two models do share some of runtime, in particular data management, so this aspect should work ok. Though we haven’t thoroughly tested it so there may be issues we’re unaware. Normally I’d recommend sticking to one model or the other, but it sounds like you’re wanting to port an existing code from OpenACC to OpenMP and do it incrementally.
- Is there really only one copy done from the Host to the Device?
It appears so. The models share the same runtime data management so the device copy of “x” would be visible when the compiler does the present check upon entering the compute region.
- Where is the Unified Memory CPU page fault coming from?
Sorry, no idea. I don’t see it when I profile the code, but I’m using Nsight-Systems which doesn’t have the print-gpu-trace option. Possibly an artifact of the profiling?
- Is again only one HtoD copy of the
x
array really?
Since x and y both point to the same device memory, the present check will pass in the same device pointer for both. The copies only occur when you call “acc_copyin” and “exit data copyout”
- Does OpenMP figure out automatically that the pointer
x
is associated to thex_dev
pointer and is already present in the GPU memory using theOpenACC
present table?
They share the same present table so should work as expected in this case.
- Have I understood correctly that this is the proper usage and benefit of
omp_target_associate_ptr
, meaning that it’s used to associate another pointer to the same data existing on the GPU? It also seems to me that this is not needed for thex
array pointer. Am I right?
I wouldn’t necessarily recommend mapping two host pointers to the same device address in the same kernel, as you do here, since it has the potential to introduce bugs, but it is a use case. The typical use case is to re-use device memory, i.e. create some device memory, map it to some host pointer, use it in a kernel, then map it to a different host pointer for another kernel thus re-using the device memory.
No, it’s not needed for “x” since this is already implicitly mapped as part of the acc_copyin call.
-Mat