I was implementing a simple qr decomposition with the help of the cublas libraries. Vectorsize equals 500. When accessing memory with cublasSetvector the average time need is about 0.01 ms. When accessing memory with cublasGetVector the average time is about 1 ms. The runtime for cublasSetVector for does not change when changing the vectorsize just in contrast to cublasGetVector.
The difference if about an order of 100 . Why is that. What am i doing wrong?