Async Cublas

Is it possbile to use cublasSgemm API in asycynchronus mode?


all cublas routines are executing asynchronously. That mean CUDA give back the control after the execution begins. You have to use


for active waiting.

But the cublas APIs do not have any stream as an input parameter.

Maybe there is no data streaming, but instruction streaming? I don’t understand this feature of CUBLAS… It may be useful in some cases but in my case it where be better if CUBLAS has blocking execution…

Maybe someone from NVIDIA could explain it? Or we wait till the CUBLAS becomes open source. :thumbup:

Right now, CUBLAS could not be called with the asynchronous API. We are considering adding it.

The source code for CUBLAS is already available:

Is there any progress in async cublas? Is it at least scheduled for inclusion in some next version?

Dito. Any news on CUBLAS streaming?

but the source code for CUBLAS 2.0 will be release ( this time you will be able to build it) and you could add streams to the functions you need.

Sorry to post hijack but since there were some good replies from nvidia, I have the same question with cufft, I don’t believe the stream can be specified and it seems the source that is available isn’t sufficient to build a version that I could easily change. Any plans to release this when the full cublas source is too?

Any idea as to when this will be made available?

It should be soon, the package is ready to go, just the usual “green” tape…

Well… any news about date of release? :unsure:

Bumping this thread.

Any progress here? I want to concurrently run CudaMemcpyAsync and CublasDgemm. I’ve added a stream as an input parameter to cudaMemcpyAsync but don’t know how to do so in the cublasDgemm call. Any other alternatives besides using streams?

Streams still hasn’t been exposed in CUBLAS, and neither has a driver API interface been made available (although I understood both were on the “to-do” list). If you want to use streams with GEMM, you best bet is to grab Volkov’s kernels and use that instead. If you want any other functions, you are out of luck at the moment.