Is cudaMemsetAsync going to be supported in a future release ?
It would be very helpful for setting up things before a kernel call, because for example zeroing a device array can often be done before some other (cpu-related) init stuff. Likewise it would be very nice with a simple method to do async copy of constants.
Related to this issue: Have you considered extending the simple syntax
a_dev = a_host
for copying host variable a_host to device variable a_dev to allow for async copying ? Maybe one could use a directive like
!$cuf async(stream)
a_dev = a_host
Just a thought. It would clean up a lot of code, make it more readable, and make it much easier to extend a program to support async memory transfers.
For the time being, I have found a way to directly call the cuda API by way of interfacing to the C-routine. This seems to work :
interface
function cudaMemsetAsync(arr, value, bytes, stream) bind(c,name='cudaMemsetAsync')
use iso_c_binding
use cudafor
integer(c_int), value :: value, stream, cudaMemsetAsync
integer(c_size_t), value :: bytes
type(C_devptr), value :: arr
end function cudaMemsetAsync
end interface
In my code I use it this way:
use iso_c_binding
integer :: n, ierr, stream
integer, allocatable,device :: i(:)
type(c_devptr) :: i_ptr
integer(c_size_t) :: nbytes
...
allocate(i(n))
nbytes = n*4
i_ptr = c_devloc(i)
ierr = cudaMemsetAsync(i, 0, nbytes,stream)
but it would be easier if it was included in cudafor.
best,
Troels