CUBLAS async memcpy


I’m working with the FORTRAN wrapper interface to CUBLAS and noticed there is no support for streams in CUBLAS.

Does this mean that I cannot do asynchronous memory copy within my fortran code?
If not, how could I go about utilizing this feature… I am running the code on a Tesla C2070.

Any input would be greatly appreciated.

Cublas does have streams support. Since about cublas 3.1, there is an API call cublasSetKernelStream() which allows you to set a stream for function calls, and the asynchronous copy APIs have been support for a very long time (and they only wrap the underlying runtime API host-device routines anyway, you use the driver or runtime routines directly if you prefer).