Using unified memory prefetching on an OpenACC fortran code

Hi,

I have an OpenACC enabled fortran application that uses unified memory. How do I enable prefetching (akin to using cudaMemPrefetchAsync())?

regards,
Naga

Hi Naga,

You can use the CUDA Fortran interfaces to call “cudaMemPrefetchAsync” directly by using the “cudafor” module and compiling with “-Mcuda”.

From the CUDA Fortran Programming Guide: http://www.pgroup.com/resources/docs/17.10/x86/cuda-fortran-prog-guide/index.htm

4.8.37. cudaMemPrefetchAsync

integer function cudaMemPrefetchAsync(devptr, count, device, stream)

cudaMemPrefetchAsync prefetches memory to the specified destination device. devptr may be any managed memory scalar or array, of a supported type specified in Device Code Intrinsic Datatypes. The count is in terms of elements. Alternatively, devptr may be of TYPE(C_DEVPTR), in which case the count is in terms of bytes.

The device argument specifies the destination device. The stream argument specifies which stream to enqueue the prefetch operation on.

Passing in cudaCpuDeviceId for the device, which is defined as a parameter in the cudafor module, will prefetch the data to CPU memory.

To get the CUDA device pointer, call the CUDA routine from within an OpenACC “host_data” region. Something like:

integer(acc_handle_kind) :: stream
! get the CUDA stream id for the OpenACC async queue
stream = acc_get_cuda_stream(asyncQueueNumber)
...

!$acc host_data use_device(Arr)
call cudaMemPrefetchAsync(Arr, ArrSizeInBytes, deviceNumber, stream)
!$acc end host_data

Hope this helps,
Mat

Thanks Matt,

I get the following error when I compile with -Mcuda

pgf95 -acc -c -Minfo=accel -ta=tesla:cc60 -ta=tesla:managed -Mcuda -lm PSMHD3-cufftacc.f95

PGF90-S-0084-Illegal use of symbol cudamemprefetchasyncr8 - attempt to CALL a FUNCTION (PSMHD3-cufftacc.f95: 1276)

I am calling it in the code as below:
prefetchsize = (Nx + 2)NyNz
!$acc host_data use_device(rho_ux)
call cudaMemPrefetchAsync(rho_ux,prefetchsize,0,stream)
!$acc end host_data

regards,
Naga

Hi Matt,

I changed the code as follows:
prefetchsize = (Nx + 2)NyNz
!$acc host_data use_device(rho_ux)
i= cudaMemPrefetchAsync(rho_ux,prefetchsize,0,stream)
!$acc end host_data

Now it compiles.

regards,
Naga