I am trying to use acc_malloc() in a Fortran code to avoid host allocations for scratch variables on the GPU. So far my attempts have failed. Here is my latest attempt:
real(ESMF_KIND_R8), allocatable:: copyArray(:,:,:) type(c_devptr) :: dev_copyArray ... dev_copyArray = acc_malloc(size) call acc_map_data(copyArray, dev_copyArray, size) ... !$acc data present(copyArray) !$acc kernels ... !$acc end kernels !$acc end data
This fails with
Failing in Thread:1 call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
However, when I explicitly allocate the
copyArray on the host side, the same code works, but of course then it isn’t any different than using the
create data clause.
Is it possible to avoid host allocation for GPU scratch arrays?