Hi All,
I am implementing a conjugate gradient (CG) solver using cuSPARSE_v2/cuBLAS_v2 libraries to cope with a large sparse matrix in my research. The wired thing I observed is the huge time cost by cublasCreate() function. I am aware that the library initialization cost is usually large, and by searching forums I found the usual time cost of cublasCreate is of ~100 ms scale, whereas in my code it cost 10 seconds! While the whole CG iteration part only cost 0.6 ~ 1 second. I also implemented CG solvers using CUSP library, which performed quite well - with the total code time of ~ 0.5 second.
So can anybody clarify me is 10-seconds a normal cost of cublasCreate? If so, why CUSP library performs much better, with a nearly neglectable initialization cost?
I am using GTX 980 Ti and CUDA-7.5. And here is my code snippet of timing cublasCreate:
// Timing begin
struct timeval begin, end;
gettimeofday(&begin, 0);
cublasStatus = cublasCreate(&cublasHandle);
// Timing end
gettimeofday(&end, 0);
float cgtime = (end.tv_sec - begin.tv_sec) * 1000.0 + (end.tv_usec - begin.tv_usec) / 1000.0;
printf("\nTime elapse: %f ms.\n", cgtime);
Thanks a lot!
Ruxi