Is cudaMemsetAsync available?

Can I use cudaMemsetAsync? If yes, what is the right way to use it?

When I have this:

istat=cudaMemsetAsync(fxpep_d,0.0d0,natom,stream4)

I get a compilation error:

PGF90-S-0038-Symbol, cudamemsetasync, has not been explicitly declared (test.f90)

Thanks,

Lam

Hi Lam,

Can I use cudaMemsetAsync?

Yes, but it’s not one we added to the CUDA Fortran interface module. Instead, you need to add your own explicit interface:

interface 
   function cudaMemsetAsync(arr, value, bytes, stream) bind(c,name='cudaMemsetAsync') 
     use iso_c_binding 
     use cudafor 
     integer(c_int),    value :: value, stream, cudaMemsetAsync 
     integer(c_size_t), value :: bytes 
     type(C_devptr), value :: arr 
   end function cudaMemsetAsync 
 end interface

Since cudaMemset only allowed for 32-bit values, we decided to write our own version which also allowed for 64-bit values. However, we did not create an async version as well.

  • Mat

Thanks Mat,

But how do I add this code of interface to my program? I’ve never used interface.

Lam

Hi Lam,

You’d just cut and paste this code into the definition section of the subroutine that’s calling cudaMemsetAsync or put it a module.

For details, I’d suggest getting a book on Fortran programing, in particular one that covers F90 and F2003.

  • Mat

Hi Mat,

Thanks for the tip.

Because the arrays I want to set values are of type double precision, I modify your code as following:

interface
  function cudaMemsetAsync(arr, val, bytes, stream) bind(c,name='cudaMemsetAsync')
    use iso_c_binding
    use cudafor
    integer(c_int),value :: stream, cudaMemsetAsync
    real(c_double),value::val
    integer(c_size_t), value :: bytes
    type(c_devptr), value :: arr
  end function cudaMemsetAsync
end interface

And here is how I try to set the values of the arrays:

         istat=cudaMemsetAsync(fxpep_d,0.0d0,natom,stream4)

Now I got compiling errors with argument 1, 3 and 4 but not 2:

PGF90-S-0446-Argument number 1 to cudamemsetasync: rank mismatch (test.f90: 261)
PGF90-S-0450-Argument number 3 to cudamemsetasync: kind mismatch (test.f90: 261)
PGF90-S-0450-Argument number 4 to cudamemsetasync: kind mismatch (test.f90: 261)

I wonder if this is because I have the variables declared differently in the fortran code compared to that in C? Here is how I declare the related variables:

   integer(kind=cuda_stream_kind)::stream4
   double precision,allocatable,device::fxpep_d(:)
   integer natom

What did I do wrong here?

Thanks,

Lam

Hi Lam,

As I mentioned before, the reason why we wrote our own cudaMemSet, is because the CUDA C version only supports 32-bit data types. Hence, you can’t use a double here.

Memset isn’t an expensive operation. Do you really need it to be async?

Though to answer the specific compiling errors. For argument #1, you need to be passing in the pointer to the device array (via the “c_devloc(fxpep_d)” intrinsic). For “natom”, have it declared as an “integer(c_int)”. Finally, I’d update the interface so that “stream” is declared as “integer(kind=cuda_stream_kind)”. I should have caught that the interface is wrong.

  • Mat

Thanks Mat,

Lam