Hello EveryBody! I have two different kernels that just copies data. First does not uses texture memory, second does. As i supposed texture usage would help to improve bandwidth cause it is cached, but i do not see any difference. Here are kernels:
Thank You in advance!
extern “C” global void kernel1(float *eli1, float *eli2, float *out, int size)
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid < size) out[tid] = eli1[tid];
Why would a cache help in that situation? Every thread is reading a different value. If anything, I would expect texture would be slower, because of the very large number of cache misses that code will probably generate.
“On devices of compute capability 1.x, some kernels can achieve a speedup when using (cached) texture fetches rather than regular global memory loads (e.g., when the regular loads do not coalesce well). Unless texture fetches provide other benefits such as address calculations or texture filtering (Section 220.127.116.11), this optimization can be counter-productive on devices of compute capability 2.0, however, since global memory loads are cached in L1 and the L1 cache has higher bandwidth than the texture cache”