cuBlas execution time

Hi Mat,

I am implementing an iterative solver using cublas and cusparse Fortran libraries. I have noticed that in every iterative step and after a few library calls a significant delay is taking place. This delay is recursive at least twice in every iterative step and it is not affected by the library call order. I measured these delays to be 0.06 to 0.08 secs depending on the problem size, while the library calls computing time is about 0.003 secs. These delays damage the performance. The computer uses two Tesla M2070 GPU, Cuda 5.5 and PGI14.3. Do you have any idea why that may happen ?


Hi Manolis,

I don’t know for sure but there might be a device synchronization occurring, either explicitly or implicitly performed when “pinned” memory is freed.

What I’d suggest is running your executable under nvprof or nvvp (they’re both the NVIDIA profiler, but nvvp has a nice GUI). This might give you an idea where the extra time is being spent.

  • Mat