cublasDgemm getting more slower

I have a problem when using cublasDgemm(this function is in cublas, and the result is AB,A=750600,B=600*1000).
for (i=0; i < N; ++i) {
N=10, total time is 0.000473s, average call is 0.0000473
N=100, total time is 0.00243s, average call is 0.0000243
N=1000, total time is 0.715072s, average call is 0.000715
N=10000, total time is 10.4998s, average call is 0.00104998

why the average time is increasing so much?

possibly because if you put load on the GPU for a short burst time, it can clock up to maximum speed before thermal limits are hit.

When you let it run for several seconds, it has to throttle down because of thermal and power limits.

cublasDgemm is a non-blocking method…Count time in a wrong way…