Can cublas use page-locked memory?

Does anyone know if cublasSetMatrix() and cublasGetMatrix() are clever enough to copy from/into page-locked host memory at high speed, if I have allocated it already with cudaMallocHost()? Or do they just do “normal” memory copies?

The bandwidth test program shows I get 4X speed-up using page-locked memory so I’d like the BLAS routines to use this feature.


Yes, if the memory is allocated with cudaMallocHost, you will see a fast transfer from cublasSetMatrix and cublasGetMatrix.