I am writing matlab mex function using sgemm from CUBLAS. my test is simple, C=sgemm_gpu(A,B). however, the first call this function take more 0.5 sec than the following call. for example, when sample size is 500. the first call takes 0.6 second. the second call only 0.005 second. my computer core i7 920 with gtx 470. The code modified from Volkov(2008) any idea? thanks.
I am writing matlab mex function using sgemm from CUBLAS. my test is simple, C=sgemm_gpu(A,B). however, the first call this function take more 0.5 sec than the following call. for example, when sample size is 500. the first call takes 0.6 second. the second call only 0.005 second. my computer core i7 920 with gtx 470. The code modified from Volkov(2008) any idea? thanks.
fmilo is correct. The first call will be slow as the CUDA context is setup, your kernel is processed by the driver, and the resulting machine code uploaded to the GPU. If you want to benchmark just the kernel execution, you need to first “warm up” your kernel by calling it once, then time the execution of subsequent calls (preferably many calls if your kernel is very quick).
fmilo is correct. The first call will be slow as the CUDA context is setup, your kernel is processed by the driver, and the resulting machine code uploaded to the GPU. If you want to benchmark just the kernel execution, you need to first “warm up” your kernel by calling it once, then time the execution of subsequent calls (preferably many calls if your kernel is very quick).
Thank fmilo and seibert very much for the answers.
The problem is very strange. I test several different mex functions which use CUBLAS. however, the first call is slow and following call is fast indenpent of functions. When I using mex function which do not use CUBLAS as first call. the first call is fast. the first mex function which use CUBLAS is slow. Do you mean that the CUDA context setup just in CUBLAS? thanks again.
Thank fmilo and seibert very much for the answers.
The problem is very strange. I test several different mex functions which use CUBLAS. however, the first call is slow and following call is fast indenpent of functions. When I using mex function which do not use CUBLAS as first call. the first call is fast. the first mex function which use CUBLAS is slow. Do you mean that the CUDA context setup just in CUBLAS? thanks again.