CUBLAS VS CUDA Kernel

koushik · July 18, 2007, 10:10pm

Memory transfer from the CPU to the device memory is time consuming. We ca use either CUBLAS functions or CUDA memcpy functions.

I tried to transfer about 1 million points from CPU to GPU and observed that CUDA function performed copy operation in ~3milliseconds whereas CUBLAS ~0.4 milliseconds.

My question is CUBLAS is also built on GPU but what is soo special abt these functions and why is this performance variation observed.

Appreciate your reply

Eastzone · August 15, 2007, 5:04pm

You mean cudaMemcpy?

I am not sure about the way you implemented. But in my opinion, cudaMemcpy is not parallized while cublas is.

mfatica · August 15, 2007, 9:28pm

cublasSetVector, cubblasGetVector, cublasSetMatrix, cublasGetMatrix
are thin wrappers around cudaMemcpy and cudaMemcpy2D. Therefore, no
significant performance differences are expected between the two sets
of copy functions.

If you want to increase performance, use pinned memory to allocate the array/matrix on the host