Hi all,
my request here is about your experience with fortran/nvfortran pointers offloading and OpenMP. I’ve many difficulties in porting a Fortran code on GPU with openMP offloading and if you could clarify the following concept/strategy or provide me some example or pointer to documentation to read…
My code is based on user defined types managed in chained lists. Most of the types looks like this simplified one:
type mytype
integer :: dim
integer, pointer :: tab(:)
type(mytype), pointer :: next
end type mytype
but types can also contain embeded chained lists.
My openmp strategy was to offload only the allocated tab arribute and I need to access these tab attributes using a pointer in loops.
integer, pointer :: ptr(:)
ptr=> current%tab
I offload data with a
!$omp enter data map(to:ptr)
to put them on the GPU and run kernels in loop (each kernel works on only one tab attribute, list is never managed on GPU)
But this raise frequent memory errors on the GPU so I think this is a wrong approach and I do not understand how to do this properly with OpenMP.
While I’m not 100% sure, I think this will only put the ptr itself on the device, not the data it points to. Try something like:
!$omp enter data map(to:ptr(:))
Hi Mat,
Yes I’m not very sure of the syntax too, and I try so many approach… My major missunderstanding is when using a pointer to go through the list. Example:
I have an array ARR(:) of these types (simpler to explain) and I use a local pointer to access the attribute tab of each element in ARR:
Do i=1 nelements
ptr=>ARR(i)%tab ! select the datas we need
!$omp target teams distribute parallel do
do j=1, ARR(i)%n
ptr(i) = 1
end do
end do
My local ptr descriptor (may be a local variable in a function) is not on the GPU, only the array ARR(i)%tab( : ) has been offloaded previously. And I do not understand what will occure in such a situation. I have some occurences where it works and others where it does not and I do not understand why.
Ok, since “tab” is already on the device, we want to “attach” it, i.e. copy “ptr” to the device and then fill in the device address to the device address of “tab”. In OpenACC I’d use the “attach” data clause, though I’m not sure about OpenMP. I think you should be able to do:
Do i=1 nelements
ptr=>ARR(i)%tab ! select the datas we need
!$omp target teams distribute parallel do map(to:ptr(:))
do j=1, ARR(i)%n
ptr(i) = 1
end do
end do
What I expect to happen is that at runtime, “ptr” will be present on the device given it has the same host address as “tab”. Then the correct device pointer will be passed into the generated CUDA kernel.
If this doesn’t work, then lets look at how you’re doing the deep copy of “ARR” and “ARR(n)%tab”.
Hi Mat,
Data offloading is centralised in a module for 1D arrays, 2D arrays, 3D arrays …. and I have implemented directive based offload:
subroutine apply_create_i1val_on_gpu(i1val)
integer, contiguous :: i1val(:)
….
!$omp target enter data map(alloc:i1val(1:nx))
…
Or API based offload:
subroutine apply_create_i1val_on_gpu(i1val)
integer, contiguous :: i1val(:)
….
i1val_cptr = omp_target_alloc(num_bytes,omp_get_default_device())
err = omp_target_associate_ptr(c_loc(i1val(1)), i1val_cptr, num_bytes, offset,omp_get_default_device())
….
And I call these subroutine with a parameter which is a:
integer, pointer :: arg(:)
from the code as Fortran allows this.
I have several small test-cases to validate these routines and they work but not my large (and more complex) code. In my approach I have serveral OpenMP offloading difficulties:
- most of the examples for managing user defined types with pointers/allocatables attributes I can found rely on
!$omp declare mapper….
directives but compilers do not implement this at this time (I’m using nvhpc/25.5, may be I should update)
- I try to write a portable code with OpenMP offloading with Nvidia compilers, but also with Gnu and Cray compilers that are less advanced than Nvidia in my mind.
- My knowledge is still limited with openMP offloading as, without a usable stencil for this approach, my progress is mainly based on tests. I know that with OpenMP offloading
pointers and allocatables are not managed identically, I know that understanding descriptors behavior is also important but I do not find some reading, at user level, for deeply understanding all of this and making the right choices.
And chatGPT provides me a pointer to this discussion when asking help :-)