Texture vs Global Memory Bandwidth

You7878 · March 24, 2010, 1:32pm

Hello EveryBody! I have two different kernels that just copies data. First does not uses texture memory, second does. As i supposed texture usage would help to improve bandwidth cause it is cached, but i do not see any difference. Here are kernels:

Thank You in advance!

[codebox]

extern “C” global void kernel1(float *eli1, float *eli2, float *out, int size)

{

int tid = blockIdx.x * blockDim.x + threadIdx.x;

if (tid < size) out[tid] = eli1[tid];

}

texture<float, 1, cudaReadModeElementType> texref1;

extern “C” global void kernel2(float *out, int size)

{

int tid = blockIdx.x * blockDim.x + threadIdx.x;

if (tid < size) out[tid] = tex1Dfetch(texref1, tid);

}

[/codebox]

avidday · March 24, 2010, 1:43pm

Why would a cache help in that situation? Every thread is reading a different value. If anything, I would expect texture would be slower, because of the very large number of cache misses that code will probably generate.

You7878 · March 24, 2010, 1:56pm

But the size of cache is bigger than 1 value? is it 8k for TPC? so when i read 1 value some neighbour values should be putted into cache. If not so, what does exactly happens?

eyalhir74 · March 24, 2010, 2:09pm

The texture cache is not like CPU L1/L2/… cache.

You might see improvement using textures if the access pattern within the block is random or semi random, for example:

float fCurrent = eli[threadIdx.x ];

float fNext = eli[ threadIdx.x + 1 ];

I think that due to the relaxation of the coalescing rules in current hardware even such pattern might not be faster with textures than with gmem reads.

eyal

You7878 · March 25, 2010, 1:09pm

The texture cache is not like CPU L1/L2/… cache.

You might see improvement using textures if the access pattern within the block is random or semi random, for example:
float fCurrent = eli[threadIdx.x ];

float fNext = eli[ threadIdx.x + 1 ];
I think that due to the relaxation of the coalescing rules in current hardware even such pattern might not be faster with textures than with gmem reads.

eyal

So on Fermi there would be never advantage in using texture for just reading elements (i mean no filtration, no wrap modes and so on)?

eyalhir74 · March 25, 2010, 2:20pm

This is kinda suggested in Fermi’s Tunning Guide:

“On devices of compute capability 1.x, some kernels can achieve a speedup when using (cached) texture fetches rather than regular global memory loads (e.g., when the regular loads do not coalesce well). Unless texture fetches provide other benefits such as address calculations or texture filtering (Section 5.3.2.5), this optimization can be counter-productive on devices of compute capability 2.0, however, since global memory loads are cached in L1 and the L1 cache has higher bandwidth than the texture cache”

http://developer.download.nvidia.com/compu…TuningGuide.pdf

But we’ll have to wait and see :)

eyal

Topic		Replies	Views
Benefits of Texture Memory couldnt use them... CUDA Programming and Performance	6	3264	February 13, 2008
texture vs global memory CUDA Programming and Performance	0	2908	December 16, 2009
texture memory vs global memory CUDA Programming and Performance	10	13915	August 20, 2007
Texture and L1 memory bandwidth CUDA Programming and Performance	14	9878	December 14, 2011
Why tex1Dfetch faster in 10-15 times then a global memory ? tex1Dfetch faster CUDA Programming and Performance	6	909	January 3, 2012
For what case should I use texture memory? CUDA Programming and Performance	8	2745	May 26, 2010
Confusion on using texture? CUDA Programming and Performance	14	5036	September 4, 2009
Convenience of 2D CUDA texture memory against global memory CUDA Programming and Performance	4	4371	January 21, 2013
When to use textures CUDA Programming and Performance	7	8206	February 12, 2008
Global versus Texture Memory - no speedup I'm not getting any benefits :( CUDA Programming and Performance	4	5264	February 17, 2008

Texture vs Global Memory Bandwidth

Related topics