Copy from texture memory to shared memory Confused about best transfer strategy

Sandra_Ullrich · February 11, 2010, 4:19pm

Hello Forum,

does it make sense to make use of texture memory to copy a whole block into the shared memory? Using the CUDA profiler I discovered that the transfer time from host to the device global memory is faster as the upload from host to the device texture memory. Therefor the fetching from texture memory is faster as fetching from global memory. Now I’m a little bit confused what strategy makes more sense.

upload from host to texture memory and then to shared memory
upload from host to global memory and then to shared memory

Thank you for helping answers!

Best regards,
Sandra

eyalhir74 · February 11, 2010, 4:23pm

It mostly depends on your memory access pattern. If you can manage coallescing use global memory, if your accesses are

scattered and “random” use textures.

In any case if you re-use the data over and over, or need all threads in a thread-block to iterate over all the data you’ve loaded

then use shared memory as well.

eyal

avidday · February 11, 2010, 4:33pm

I would use either textures or shared memory, but I don’t think I would use both together.

The big advantage with textures is that you get a read cache, so for data with relatively tight spatial locality, you can get a useful speed up over global memory alone without the need for read patterns which will coalesce. But there can also be cache misses, which adds and additional penalty and can make textures slower the global memory. On average, textures are usually faster than “naked” global memory loads. The fact you can also do filtering/interpolation for free at the same time can yield big performance wins, if you need it.

One the other hand, coalesced reads into shared memory are usually worthwhile when you need non-linear global memory reads which can be assembled block-wise into coalesced reads, and you need to re-use data more than once across several threads within the same block. Fully coalesced global memory loads are basically the fastest off chip memory access method there is. If you can use them, you probably ought to prefer them over textures (unless you can also exploit filtering).

Knowing which one is most suitable requires analysis of your global memory access patterns.

Sandra_Ullrich · February 11, 2010, 5:21pm

Hello Eyal,

thank your very much for your quick reply.

In my case I would like to apply an demosaicing filter to an char array (IplImage from OpenCV). So I’ll iterate over all the data and won’t access the texture randomly. Unfortunately I’m a bloody beginner in CUDA and for that reason not familiar with coallescing… but I promise to know more about this next time.

Do you think you can tell me (bloody beginner) in some words what coallescing is? In my programm I do following steps:

...

int size = sizeof(unsigned char)*iplIn->width*iplIn->height;

cudaMalloc((void**)&DEVres, size*3);

cudaMemcpy(DEVgray, iplIn->imageData, size, cudaMemcpyHostToDevice);

...

Later on in the kernel I use three IF-statements to calculate the colors. That’s not very ladylike and results in a bad performance… BUT: it works! (for now)

I’ll be very happy about any hints.

Thx!

Sandra

eyalhir74 · February 11, 2010, 7:10pm

Search the programming guide for coallescing. In simple words you need your threads (or half-wrap which is a group of 16 threads)

to access contigues memory address. in case your access is not coallesced you’ll suffer from bad performance and the

hardware will make needless trips to global memory (which are not needed in case your access pattern is coalesced).

Different GPU cards have different rules (the newer ones have more relaxed rules) - the easiest would be to look at

the programming guide :)

as for your code - maybe you can post the kernel so we can have a look at it and see how to improve it…

eyal

Topic		Replies	Views
When to use textures CUDA Programming and Performance	7	8121	February 12, 2008
Question about texture/shared memory enhance the computing efficiency CUDA Programming and Performance	3	5381	December 4, 2007
When is it worth copying global to texture memory CUDA Programming and Performance	2	3360	July 7, 2008
Texture and Global Memory CUDA Programming and Performance	2	3840	July 11, 2007
Is coalescing access important to texture memory? CUDA Programming and Performance	10	12818	March 16, 2008
Texture? Just a short lesson... CUDA Programming and Performance	5	2715	March 9, 2008
Question about textures CUDA Programming and Performance	5	7834	May 9, 2008
Texture memory faster than global memory? texture memory vs coalesced memory access CUDA Programming and Performance	3	2007	May 18, 2009
CUDA texture memory performance CUDA Programming and Performance	4	33545	January 13, 2009
Texture Memory vs. Global Memory and float4 CUDA Programming and Performance	5	1836	November 1, 2010

Copy from texture memory to shared memory Confused about best transfer strategy

Related topics