cudaMemsetAsync and easier syntax for async copying

Is cudaMemsetAsync going to be supported in a future release ?

It would be very helpful for setting up things before a kernel call, because for example zeroing a device array can often be done before some other (cpu-related) init stuff. Likewise it would be very nice with a simple method to do async copy of constants.

Related to this issue: Have you considered extending the simple syntax

a_dev = a_host

for copying host variable a_host to device variable a_dev to allow for async copying ? Maybe one could use a directive like

!$cuf async(stream)
a_dev = a_host

Just a thought. It would clean up a lot of code, make it more readable, and make it much easier to extend a program to support async memory transfers.

For the time being, I have found a way to directly call the cuda API by way of interfacing to the C-routine. This seems to work :

interface
  function cudaMemsetAsync(arr, value, bytes, stream) bind(c,name='cudaMemsetAsync')
    use iso_c_binding
    use cudafor
    integer(c_int),    value :: value, stream, cudaMemsetAsync
    integer(c_size_t), value :: bytes
    type(C_devptr), value :: arr
  end function cudaMemsetAsync
end interface

In my code I use it this way:

  use iso_c_binding
  integer :: n, ierr, stream
  integer, allocatable,device :: i(:)
  type(c_devptr) :: i_ptr
  integer(c_size_t) :: nbytes
  ...
  allocate(i(n))
  nbytes = n*4
  i_ptr = c_devloc(i)
  ierr = cudaMemsetAsync(i, 0, nbytes,stream)

but it would be easier if it was included in cudafor.

best,

Troels

Hi Troels,

Because cudaMemset only takes 32-bit values, we decided to write our own implementation. However, we didn’t add cudaMemsetAsync. I asked our engineering manager who said that we probably couldn’t do anything in the short term but will see what we can do.

  • Mat

Hi Mat,

Thanks for the fast reply. I will keep my CUDA-C interfacing for the time being then, and dream about !$cuf async’s in a distant future :-)

Troels