Is cudaMemsetAsync going to be supported in a future release ?
It would be very helpful for setting up things before a kernel call, because for example zeroing a device array can often be done before some other (cpu-related) init stuff. Likewise it would be very nice with a simple method to do async copy of constants.
Related to this issue: Have you considered extending the simple syntax
a_dev = a_host
for copying host variable a_host to device variable a_dev to allow for async copying ? Maybe one could use a directive like
!$cuf async(stream) a_dev = a_host
Just a thought. It would clean up a lot of code, make it more readable, and make it much easier to extend a program to support async memory transfers.
For the time being, I have found a way to directly call the cuda API by way of interfacing to the C-routine. This seems to work :
interface function cudaMemsetAsync(arr, value, bytes, stream) bind(c,name='cudaMemsetAsync') use iso_c_binding use cudafor integer(c_int), value :: value, stream, cudaMemsetAsync integer(c_size_t), value :: bytes type(C_devptr), value :: arr end function cudaMemsetAsync end interface
In my code I use it this way:
use iso_c_binding integer :: n, ierr, stream integer, allocatable,device :: i(:) type(c_devptr) :: i_ptr integer(c_size_t) :: nbytes ... allocate(i(n)) nbytes = n*4 i_ptr = c_devloc(i) ierr = cudaMemsetAsync(i, 0, nbytes,stream)
but it would be easier if it was included in cudafor.