Omp target data use_device_ptr vs use_device_addr

Hello,

I have a question when translating OpenACC’s acc host_data use_device(arr), where arr is a Fortran allocatable array, to OpenMP. What’s the difference between omp target data use_device_ptr and use_device_addr? I’ve read the specification but don’t get it. Any advice would be appreciated!

Thanks,
Victor

Hi Victor,

I had to ask this too a few weeks ago since the description in the standard isn’t obvious.

use_device_ptr has been in the OpenMP standard longer. It is designed to support systems without a Unified Virtual Address Space. In principle, the pointer returned from use_device_ptr could be a host handle. The use_device_addr is a newer feature that assumes that the system has a Unified Virtual Address Space. For portability there is a requires unified_address to ensure that use_device_addr can safely be used.

However given the compiler can implicitly handle unified memory, the NVHPC compilers treat them the same.

-Mat

Hi Mat,

Thanks for your explanation! Good to know that they are treated equally by NVHPC.

Victor

Hi,

Just to chime in - it seems they are not treated equally after all, at least when using “nounified,nomanaged” or “mem:separate”.

In that case, when I use “addr” I get incorrect results, but using “ptr” works correctly.

– Ron

Hi @MatColgrove

I’m trying to move an OpenACC application to OpenMP. At this time I’m testing how to implement some of the concepts needed and I’m stuck with using mpi between GPUs. As the array tab is offloaded on the GPU, I’m replacing openacc implementations:

      !$acc host_data  use_device(tab)
      call MPI_SEND(tab,N*N,MPI_INTEGER,0,111,MPI_COMM_WORLD,ierror)
      !$acc end host_data

with OpenMP code:

      !$omp target data use_device_addr(tab)
      call MPI_SEND(tab,N*N,MPI_INTEGER,0,111,MPI_COMM_WORLD,ierror)
      !$omp end target data

But the openMP code fails with wrong results as soon as I have more than 4 GPUs (it works fine with 2 to 4 GPUs). The OpenACC code works (tested from 2 to 8 GPUs).

I think this is related to the !$omp requires unified_address discussed here but inserting this directive at the begining of the code do not help. And I do not understand how to use it clearly in a large fortran program.
Unfortunatly there is no example provided in https://www.openmp.org/wp-content/uploads/openmp-examples-5.2.2-final.pdf.

Did you have any detailed documentation pointer or small fortran example ? I’m testing with nvhpc/24.3 and nvhpc/24.7

Thanks

Patrick

Hi Patrick,

With my code, I got an error message with use_device_addr but correct results with use_device_ptr. This is similar to Ron’s comment above. (I used 24.9).

So I think it’s worth trying use_device_ptr if you haven’t.

Victor

2 Likes

Thank you Victor, switching to use_device_ptr solves the problem. I was thinking that this directive was only for pointer, allocatable or taget arrays. But it works also with arrays with size set at compile time (for my small test case).
Patrick

1 Like

Hello all!

I’m back with this OpenMP problem with use_device_addr directive. Indeed some compilers start to warn that using use_device_ptr on an allocated fortran array is deprecated and will be forbiden soon. So I investigate again the use_device_addr wich should be used instead.

It works for most compilers but with Nvidia. The small attached fortran test case is just calling a C function to print the address of an allocated array on the node and on the GPU when called with a use_device_addr/use_device_ptr. It show this information in the main, then in a subroutine receiving this offloaded array. It print first the address on the host and then the address on the device.

With use_device_ptr (setup in the provided test_case) it works.
With use_device_addr it silently fails with nvidia compilers, showing always the host address (2 occurences to change from use_device_ptr to use_device_addr in main.f90).

Checked with 24/11 version of nvfortran. I thing this is a compiler error.

TEST_CASE_ADDR-Nvidia.tar.gz (976 Bytes)

Patrick

Thanks for the follow-up Patrick.

Personally I wasn’t seeing the point of having both “ptr” and “addr”. Per the standard, the primary difference is “ptr” is for C pointers and “addr” is for a “data entity”. Technically different things, but effectively the same as they both just return device addresses. The standard folks might have just been cleaning up as “address” is a general term while “ptr” has specific meanings.

None the less, I tested the code against our development flang based compiler (which will eventually replace nvfortran), and it’s giving a warning that “use_device_ptr” should only be used with C pointers. So it looks like we’re eventually going to start enforcing the distinction and I’ll need to have our Fortran programmers start using “use_device_addr” (or start adding C_LOC to “ptr”).

Engineering did a bit a digging, and it turns out the back-end Fortran compiler is accidently dropping “use_device_addr”, making it a no-op. I added a report, TPR #36867 and will see if we can get it fixed.

1 Like