Cublas using cublasSetX memory with another kernel

ratzes · September 19, 2018, 5:27pm

I was wondering if there were any problems to look out for in using cublasSetVector for example, and then passing that memory into cuda launch afterwards. Are there cases where the structure of the memory is not the same as doing cudaMalloc for a given kernel?

Robert_Crovella · September 19, 2018, 5:37pm

cublasSetVector is roughly equivalent to a cudaMemcpy operation, not a cudaMalloc operation. cublasSetVector can effectively map to a cudaMemcpy2D operation, so in that respect it may or may not conform to what you are expecting. But there should be an essentially equivalent cudaMemcpy2D operation for a particular cublasSetVector operation.

The equivalent to cudaMalloc would be cublasAlloc, but that function has been deprecated. You should use cudaMalloc instead.

ratzes · September 19, 2018, 6:15pm

Thanks!

So is it typical to allocate an array, say *A, run cublasSetVector, then do kernel1(A), cublassCommand(A…), kernel2(A)? Or is there a better/faster way to interleave cublas operations with kernels?

Robert_Crovella · September 19, 2018, 8:33pm

I don’t know of a better way. Your descriptions are quite limited. So I’m guessing here. For example, I am assuming that your cublas command depend on the results from kernel1 being complete, and likewise the kerne2 depends on the results from cublas being complete. That seems likely to me since they are all working on A.

What other way would there be to do this?
I guess I really don’t understand the question.

ratzes · September 19, 2018, 9:04pm

Sorry, yeah I wasn’t really clear what my concerns were.

I guess I’m worried that there are lurking issues with mutating memory allocated for cublas from a device kernel both after cublasSetX and before a cublas call, and after a cublas call. All while not copying any memory back to host. I can’t find any example code anywhere of something like this being done.

I haven’t written much yet, but this stems from a simple test with simpleDevLibCUBLAS/kernels.cu where I removed the const requirement from d_A (which did a cublasSetMatrix) and got an error when I did “d_A[2] = 0.0f;” (it doesn’t say what the error is)

Thanks for your help!

Robert_Crovella · September 19, 2018, 10:30pm

The cublasSetMatrix prototype indicates that it expects a const-qualified pointer, although that would be the host pointer.

[url]https://docs.nvidia.com/cuda/cublas/index.html#cublassetmatrix[/url]

Not sure what you mean “it doesn’t say what the error is” ?

Are you working in Visual Studio and you got a MSB3721 error? In that case you need to turn up the visual studio verbosity. Just google that.

ratzes · September 19, 2018, 10:46pm

I looked into it and it turns out the error is just coming from the function that checks to make sure the different results are all the same between cublas on device, cublas from host, and cpu…

So i’m all good, thanks for your help! Excited to interleave some cublas when I get done with work