Greetings,
I have decided to use pinned buffers to see if my CublasDgemm becomes sped up. Although the results are correct, I’ve observed that it’s slowed down the routine considerably (over 20 times slower). I’ve commented out the cublasInit in place of cudaSetDeviceFlags(cudaDeviceMapHost) and then used cudaHostAlloc and cudaHostGetDevicePointer to set up the pointers. Any ideas?