How different kernels affect the performance Performance issues

I don’t know if anyone have noticed this, it seems to me that different kernels or even independent functions may have a great affect in the performance of the other .
I develop an image processing library in CUDA, when i test the performance of these
functions, i can see that the performance of some function can several time slower or faster with or without the present of other kernel functions, or even the oders that i call the test functions also make the big performance change. I some time experience the same thing in CPU, but the changing magnitude is not so big like this (3-5 times faster / slower).

For example i have two kernel function that perform the same operation using texture and global memory

This is the performance when i call the texture function first, then global memory version
CPU time = 0.592947
tex time = 0.020082 (error = 0.000000)
GPU time = 0.089476 (error = 0.000000)

Speedup over CPU

GPU: 6.626883x
tex: 29.526292x

this is the performance when i call global memory version first

CPU time = 0.579712
GPU time = 0.076125 (error = 0.000000)
tex time = 0.050386 (error = 0.000000)

Speedup over CPU

GPU: 7.615264x
tex: 11.505418x

The texture version is now 3 time slower.

So can some one explain me what bring that noticeable performance changing. While the texture operate on the texture memory, and GPU version operate on global memory only, they does not shared anything in common and can run separately.

Is there any hint/mechanism that we can control the performance in this case. This is only a simple example, when i develop a big program how can i control the performance change when i add more kernels functions.

Did you include the first run in your timing?
The first run include some “optimization time for kernels”. It could be expected to be slower as # of kernels increase. And this won’t be averaged out over many passes. My program with ~150 passes still shows this behavior in first run.

Strange, do you do cudaThreadSynchronize(); before starting and stopping timers?

Yes I do.

I count the first iteration ,it may cause the problem. I make some other change in my code and now it went away, it seems not reproducible, I will inform when it happen again and more details about what i did

Thank you guys.