I am writing matlab mex function using sgemm from CUBLAS. my test is simple, C=sgemm_gpu(A,B). however, the first call this function take more 0.5 sec than the following call. for example, when sample size is 500. the first call takes 0.6 second. the second call only 0.005 second. my computer core i7 920 with gtx 470. The code modified from Volkov(2008) any idea? thanks.