cublas and pinned buffers


I have decided to use pinned buffers to see if my CublasDgemm becomes sped up. Although the results are correct, I’ve observed that it’s slowed down the routine considerably (over 20 times slower). I’ve commented out the cublasInit in place of cudaSetDeviceFlags(cudaDeviceMapHost) and then used cudaHostAlloc and cudaHostGetDevicePointer to set up the pointers. Any ideas?


Are you not using mapped memory rather than simple pinned memory ? This would create an excessive amount of memory transfers…


Hmm. So what should I use in place of cublasSetVector in order to get a speedup? Or is cublasSetVector the best I can do here?