I am a fairly new to the world of CUDA Fortran, and I’ve stumbled upon something I can’t quite wrap my head around. I am strictly working with compute capability 6.0.
The situation: I have a host-side derived type containing an allocatable device array. However, only certain operations seem to work with those “nested” arrays after allocation unless they are wrapped in procedure calls. I tried to construct a minimal example for this issue (it’s on pastebin for formatting, but I can also repost it here if need be):
The driver program copies a host array to both an “explicit” device array and a component of the derived type (also supposedly a device array). Then it copies both device arrays back to a host array, covering both host-to-device transfer cases. This works just fine, and the writes “b” and “c” give the correct output.
Things start to break down when the program attempts to make device-to-device copy involving a “nested” reference to the device array within the derived type (d_hst%arr = arr_d; “More than one device-resident object in assignment”). Strangely enough, it does work if the copying is done within a subroutine call instead (here: “copy_device_array” call).
My question is now: Why does device-to-device copying work within a subroutine call but not “directly” for this particular case? I fear I might be misunderstanding certain things on a fundamental level here, but I’d be more than happy to learn what is at work here.
Thank you in advance.
Edit: Changed line references to explicit statements