Hi, i found a strange phenomenon that the calculate speed would be different between continuous execution and intermittent execution for same kernel. For example, it will cost 200ms for one execution if i execute the kernel for 100 times continuously. But if I execute kernel ten times, wait for 10 seconds and execute another ten times, the first kernel execution (11th) will cost 400ms.
I tried to use cudaEvent to record time and got same result. So the timing part is right.
I also tried to use nsight to see what happened, but I didn’t see any difference between the first execution and others in one group.
Is this some optimization strategy of CUDA? How can i avoid this phenomenon and keep the kernel execute speed steady?
The cuda version I used is 10.2.
Here is the test code
Wait 10 seconds
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
std::clock_t start, end;
start = clock();
std::cout << "number: " << i * 10 + j << std::endl;
bilateralFilter_kernel<<<>>>;
cudaDeviceSynchronize();
end = clock();
double elapsedTime = (double)(end - start) / CLOCKS_PER_SEC;
std::cout << elapsedTime << " s cuda calc time" << std::endl;
}
std::cout << "wait 10 S" << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(10000));
}
Result:
number: 0
0.261 s cuda calc time
number: 1
0.22 s cuda calc time
number: 2
0.219 s cuda calc time
number: 3
0.22 s cuda calc time
number: 4
0.22 s cuda calc time
number: 5
0.22 s cuda calc time
number: 6
0.219 s cuda calc time
number: 7
0.223 s cuda calc time
number: 8
0.218 s cuda calc time
number: 9
0.218 s cuda calc time
wait 10 S
number: 10
0.431 s cuda calc time
number: 11
0.217 s cuda calc time
number: 12
0.219 s cuda calc time
number: 13
0.218 s cuda calc time
number: 14
0.219 s cuda calc time
number: 15
0.22 s cuda calc time
number: 16
0.22 s cuda calc time
number: 17
0.219 s cuda calc time
number: 18
0.218 s cuda calc time
number: 19
0.218 s cuda calc time
.
.
.
wait 10 S
number: 90
0.449 s cuda calc time
number: 91
0.22 s cuda calc time
number: 92
0.22 s cuda calc time
number: 93
0.22 s cuda calc time
number: 94
0.22 s cuda calc time
number: 95
0.219 s cuda calc time
number: 96
0.219 s cuda calc time
number: 97
0.221 s cuda calc time
number: 98
0.22 s cuda calc time
number: 99
0.22 s cuda calc time
Continuous 100 times
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
std::clock_t start, end;
start = clock();
std::cout << "number: " << i * 10 + j << std::endl;
bilateral_kernel_float<<<>>>;
getLastCudaError("Cuda failed after bilateral_kernel_float:");
cudaDeviceSynchronize();
end = clock();
double elapsedTime = (double)(end - start) / CLOCKS_PER_SEC;
std::cout << elapsedTime << " s cuda calc time" << std::endl;
}
// std::cout << "wait 10 S" << std::endl;
// std::this_thread::sleep_for(std::chrono::milliseconds(10000));
}
Result:
number: 0
0.259 s cuda calc time
number: 1
0.219 s cuda calc time
number: 2
0.223 s cuda calc time
number: 3
0.222 s cuda calc time
number: 4
0.221 s cuda calc time
number: 5
0.22 s cuda calc time
number: 6
0.22 s cuda calc time
number: 7
0.22 s cuda calc time
number: 8
0.218 s cuda calc time
number: 9
0.22 s cuda calc time
number: 10
0.221 s cuda calc time
number: 11
0.22 s cuda calc time
number: 12
0.22 s cuda calc time
number: 13
0.219 s cuda calc time
number: 14
0.22 s cuda calc time
number: 15
0.22 s cuda calc time
number: 16
0.219 s cuda calc time
number: 17
0.22 s cuda calc time
number: 18
0.219 s cuda calc time
number: 19
0.222 s cuda calc time
number: 20
0.223 s cuda calc time
number: 21
0.222 s cuda calc time
number: 22
0.223 s cuda calc time
number: 23
0.223 s cuda calc time
number: 24
0.223 s cuda calc time
number: 25
0.222 s cuda calc time
.
.
.
number: 80
0.224 s cuda calc time
number: 81
0.225 s cuda calc time
number: 82
0.223 s cuda calc time
number: 83
0.224 s cuda calc time
number: 84
0.223 s cuda calc time
number: 85
0.224 s cuda calc time
number: 86
0.223 s cuda calc time
number: 87
0.224 s cuda calc time
number: 88
0.225 s cuda calc time
number: 89
0.225 s cuda calc time
number: 90
0.224 s cuda calc time
number: 91
0.226 s cuda calc time
number: 92
0.223 s cuda calc time
number: 93
0.225 s cuda calc time
number: 94
0.223 s cuda calc time
number: 95
0.225 s cuda calc time
number: 96
0.223 s cuda calc time
number: 97
0.224 s cuda calc time
number: 98
0.224 s cuda calc time
number: 99
0.224 s cuda calc time