Hi, i found a strange phenomenon that the calculate speed would be different between continuous execution and intermittent execution for same kernel. For example, it will cost 200ms for one execution if i execute the kernel for 100 times continuously. But if I execute kernel ten times, wait for 10 seconds and execute another ten times, the first kernel execution (11th) will cost 400ms.

I tried to use cudaEvent to record time and got same result. So the timing part is right.

I also tried to use nsight to see what happened, but I didnâ€™t see any difference between the first execution and others in one group.

Is this some optimization strategy of CUDA? How can i avoid this phenomenon and keep the kernel execute speed steady?

The cuda version I used is 10.2.

Here is the test code

**Wait 10 seconds**

```
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
std::clock_t start, end;
start = clock();
std::cout << "number: " << i * 10 + j << std::endl;
bilateralFilter_kernel<<<>>>;
cudaDeviceSynchronize();
end = clock();
double elapsedTime = (double)(end - start) / CLOCKS_PER_SEC;
std::cout << elapsedTime << " s cuda calc time" << std::endl;
}
std::cout << "wait 10 S" << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(10000));
}
```

Result:

number: 0

0.261 s cuda calc time

number: 1

0.22 s cuda calc time

number: 2

0.219 s cuda calc time

number: 3

0.22 s cuda calc time

number: 4

0.22 s cuda calc time

number: 5

0.22 s cuda calc time

number: 6

0.219 s cuda calc time

number: 7

0.223 s cuda calc time

number: 8

0.218 s cuda calc time

number: 9

0.218 s cuda calc time

**wait 10 S**

number: 10

**0.431 s** cuda calc time

number: 11

0.217 s cuda calc time

number: 12

0.219 s cuda calc time

number: 13

0.218 s cuda calc time

number: 14

0.219 s cuda calc time

number: 15

0.22 s cuda calc time

number: 16

0.22 s cuda calc time

number: 17

0.219 s cuda calc time

number: 18

0.218 s cuda calc time

number: 19

0.218 s cuda calc time

.

.

.

**wait 10 S**

number: 90

**0.449 s** cuda calc time

number: 91

0.22 s cuda calc time

number: 92

0.22 s cuda calc time

number: 93

0.22 s cuda calc time

number: 94

0.22 s cuda calc time

number: 95

0.219 s cuda calc time

number: 96

0.219 s cuda calc time

number: 97

0.221 s cuda calc time

number: 98

0.22 s cuda calc time

number: 99

0.22 s cuda calc time

**Continuous 100 times**

```
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
std::clock_t start, end;
start = clock();
std::cout << "number: " << i * 10 + j << std::endl;
bilateral_kernel_float<<<>>>;
getLastCudaError("Cuda failed after bilateral_kernel_float:");
cudaDeviceSynchronize();
end = clock();
double elapsedTime = (double)(end - start) / CLOCKS_PER_SEC;
std::cout << elapsedTime << " s cuda calc time" << std::endl;
}
// std::cout << "wait 10 S" << std::endl;
// std::this_thread::sleep_for(std::chrono::milliseconds(10000));
}
```

Result:

number: 0

0.259 s cuda calc time

number: 1

0.219 s cuda calc time

number: 2

0.223 s cuda calc time

number: 3

0.222 s cuda calc time

number: 4

0.221 s cuda calc time

number: 5

0.22 s cuda calc time

number: 6

0.22 s cuda calc time

number: 7

0.22 s cuda calc time

number: 8

0.218 s cuda calc time

number: 9

0.22 s cuda calc time

number: 10

0.221 s cuda calc time

number: 11

0.22 s cuda calc time

number: 12

0.22 s cuda calc time

number: 13

0.219 s cuda calc time

number: 14

0.22 s cuda calc time

number: 15

0.22 s cuda calc time

number: 16

0.219 s cuda calc time

number: 17

0.22 s cuda calc time

number: 18

0.219 s cuda calc time

number: 19

0.222 s cuda calc time

number: 20

0.223 s cuda calc time

number: 21

0.222 s cuda calc time

number: 22

0.223 s cuda calc time

number: 23

0.223 s cuda calc time

number: 24

0.223 s cuda calc time

number: 25

0.222 s cuda calc time

.

.

.

number: 80

0.224 s cuda calc time

number: 81

0.225 s cuda calc time

number: 82

0.223 s cuda calc time

number: 83

0.224 s cuda calc time

number: 84

0.223 s cuda calc time

number: 85

0.224 s cuda calc time

number: 86

0.223 s cuda calc time

number: 87

0.224 s cuda calc time

number: 88

0.225 s cuda calc time

number: 89

0.225 s cuda calc time

number: 90

0.224 s cuda calc time

number: 91

0.226 s cuda calc time

number: 92

0.223 s cuda calc time

number: 93

0.225 s cuda calc time

number: 94

0.223 s cuda calc time

number: 95

0.225 s cuda calc time

number: 96

0.223 s cuda calc time

number: 97

0.224 s cuda calc time

number: 98

0.224 s cuda calc time

number: 99

0.224 s cuda calc time