I would like to use cublas functions, but also use GPU kernel functions to change arrays, without copying them to the host.
To use cublas, you need to do a cublasInit, cublasAlloc and cublasSetMatrix and afterwards:
cublasgetmatrix, cublasFree and cublasShutdown.
Suppose I do a random initialization of a vector allocated with the regular (non cublas) functions
cutilSafeCall(cudaMalloc((void**)&d_seeds, seedz));
and
cutilSafeCall(cudaMemcpy(h_seeds, d_seeds, seedz, cudaMemcpyDeviceToHost));
and without copying it to the host and back to the device, continue to use the same array but now as cublasMatrix. The problem with cublasSetMatrix is that it always does a copy from host to device. You can’t use an existing memory location on the device as cublasMatrix, or is that possible?
In other words: can you leave out cublasSetMatrix and cublasAlloc if the vector has already been allocated by cudamalloc and filled with data on the device?