Segmentation fault due to broken OpenACC implicit detach behaviour

The following program produces a segmentation fault if the nested pointer is not detached explicitly. Compiled with nvfortran -acc=gpu detach.f90. This bug was introduced somewhere between versions 21.1 and 21.7 of the Nvidia HPC SDK.

program detach

type :: derived_struct
  integer, dimension(:), allocatable :: array
end type derived_struct

type(derived_struct) :: struct

allocate(struct%array(1))

struct%array(1) = 1

!$acc enter data copyin(struct)
!$acc enter data copyin(struct%array)

!$acc kernels
struct%array(1) = struct%array(1) + 777
!$acc end kernels

!!$acc exit data detach(struct%array)  !uncomment me for expected behaviour
!$acc exit data copyout(struct%array)
!$acc exit data copyout(struct)

write(*,*) struct%array(1)

end program detach

Thank you 🙂

Hi nmnobre,

The issue here is with the “copyout” of “struct”. Copyout will perform a shallow copy of the structure, including the device pointer of “array”. Hence when accessed on the host, the code is dereferencing a device pointer.

The correct method here would be to replace “copyout(struct)” with “delete(struct)”. If the structure include data members that you do want copied back, these need to be done individually (as you do with “array”).

-Mat

My understanding is that copyout(struct%array) should implicitly detach the pointer? So that copyout(struct) would copy a valid host pointer? This used to be the expected behaviour, see Deep Copy Support in OpenACC | PGI and it seems to work with the nvfortran included with version 21.1 of the SDK.

Ok, I’m incorrect here. I forgot that the behavior changed in the 2.6 spec where detach is supposed to reset the device copy of the pointer back to the host pointer prior to copying. I’ll investigate further. In this particular case, I’d still recommend just deleting “struct” given the detach and copyout operations are just adding unnecessary overhead.

Relevant section of the OpenACC 3.1 spec:

1536 counter for the pointer var is decremented. If the attachment counter is decreased to zero, the
1537 pointer is detached by initiating an update for the pointer var in device memory to have the same
1538 value as the corresponding pointer in local memory. The update may complete asynchronously,
1539 depending on other clauses on the directive. The pointer update must precede any data copies due
1540 to copyout actions that are performed for the same directive.

1 Like

I talked with Michael and the change in behavior (looks like it started in the 21.5 release) is not expected. Hence, I’ve added a problem report, TPR #31499, and sent it to engineering for review.

-Mat

Hi Mat,

I do agree that, in this particular example, the delete clause would be more efficient. I was just trying to work out a minimal example to demonstrate the bug, not trying to say you should normally use code like this. 🙂

Thank you for the swift replies, I really appreciate it.
Ah! We have a couple of releases on our machine, I knew it happened at some point in 2021, but I don’t have access to all releases to properly bisect. 😆
Thanks again for the quick turnaround, it’ll hopefully be an easy fix.

-Nuno

Hi Mat,

This seems to have been fixed in version 22.5, can you please confirm that’s indeed the case? 🎉

-Nuno

I looked at the TPR and indeed, it was fixed at the end of April so made it into the 22.5 release. Though they forgot to assign it me so I could post a notification. Sorry about that.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.