Cublas using cublasSetX memory with another kernel

I was wondering if there were any problems to look out for in using cublasSetVector for example, and then passing that memory into cuda launch afterwards. Are there cases where the structure of the memory is not the same as doing cudaMalloc for a given kernel?

cublasSetVector is roughly equivalent to a cudaMemcpy operation, not a cudaMalloc operation. cublasSetVector can effectively map to a cudaMemcpy2D operation, so in that respect it may or may not conform to what you are expecting. But there should be an essentially equivalent cudaMemcpy2D operation for a particular cublasSetVector operation.

The equivalent to cudaMalloc would be cublasAlloc, but that function has been deprecated. You should use cudaMalloc instead.


So is it typical to allocate an array, say *A, run cublasSetVector, then do kernel1(A), cublassCommand(A…), kernel2(A)? Or is there a better/faster way to interleave cublas operations with kernels?

I don’t know of a better way. Your descriptions are quite limited. So I’m guessing here. For example, I am assuming that your cublas command depend on the results from kernel1 being complete, and likewise the kerne2 depends on the results from cublas being complete. That seems likely to me since they are all working on A.

What other way would there be to do this?
I guess I really don’t understand the question.

Sorry, yeah I wasn’t really clear what my concerns were.

I guess I’m worried that there are lurking issues with mutating memory allocated for cublas from a device kernel both after cublasSetX and before a cublas call, and after a cublas call. All while not copying any memory back to host. I can’t find any example code anywhere of something like this being done.

I haven’t written much yet, but this stems from a simple test with simpleDevLibCUBLAS/ where I removed the const requirement from d_A (which did a cublasSetMatrix) and got an error when I did “d_A[2] = 0.0f;” (it doesn’t say what the error is)

Thanks for your help!

The cublasSetMatrix prototype indicates that it expects a const-qualified pointer, although that would be the host pointer.

Not sure what you mean “it doesn’t say what the error is” ?

Are you working in Visual Studio and you got a MSB3721 error? In that case you need to turn up the visual studio verbosity. Just google that.

I looked into it and it turns out the error is just coming from the function that checks to make sure the different results are all the same between cublas on device, cublas from host, and cpu…

So i’m all good, thanks for your help! Excited to interleave some cublas when I get done with work