I don’t know if anyone have noticed this, it seems to me that different kernels or even independent functions may have a great affect in the performance of the other .
I develop an image processing library in CUDA, when i test the performance of these
functions, i can see that the performance of some function can several time slower or faster with or without the present of other kernel functions, or even the oders that i call the test functions also make the big performance change. I some time experience the same thing in CPU, but the changing magnitude is not so big like this (3-5 times faster / slower).
For example i have two kernel function that perform the same operation using texture and global memory
This is the performance when i call the texture function first, then global memory version
CPU time = 0.592947
tex time = 0.020082 (error = 0.000000)
GPU time = 0.089476 (error = 0.000000)
Speedup over CPU
GPU: 6.626883x
tex: 29.526292x
this is the performance when i call global memory version first
CPU time = 0.579712
GPU time = 0.076125 (error = 0.000000)
tex time = 0.050386 (error = 0.000000)
Speedup over CPU
GPU: 7.615264x
tex: 11.505418x
The texture version is now 3 time slower.
So can some one explain me what bring that noticeable performance changing. While the texture operate on the texture memory, and GPU version operate on global memory only, they does not shared anything in common and can run separately.
Is there any hint/mechanism that we can control the performance in this case. This is only a simple example, when i develop a big program how can i control the performance change when i add more kernels functions.