Confused about linear and texture (array) memory

Hi all,
I’m a bit confused about linear, array and texture memory. I’m using a GT 8800 that has 512 MB total RAM.

Does the card statically reserve some of it’s RAM for the different types of memory?

Why is there a need to copy the result of a kenel (linear global memory) to array memory so that the array can then be used to create a texture for texturing.

Does texturing copy the array memory once again to some special place in GPU memory where texturing can be used or does texturing directly use the array memory?

If all of these memory types are on the same GPU memory, why is there a need at all to copy from linear to array memory to use texturing - why can’t we directly use the output of a kernel as input texture for another kernel?

Can someone give me some insight about why things are that way.


The main difference between global memory, texture memory, and constant memory is the path used to access it. All three of these types of memory live in the off-GPU, but on-board, DDR3 chips in your graphics card. There is no physical partitioning of your 512 MB of memory between global, texture and constant memory. (There is a limit of 64 KB of data in constant memory, but I don’t know if this is a physical limit, or some kind of addressing limit in the way constant memory is cached.)

What is different between the three is the hardware used to access it:

  • Global memory reads go direct to the memory controller, with no intervening cache. This is what happens when you do normal C-style reads from arrays allocated with cudaMalloc().

  • Constant memory reads go through the 8 kB per multiprocessor constant cache. If the word is found in the cache, it is immediately returned from there. Otherwise, the constant cache fetches the required data from global memory.

  • Texture memory reads go through the 6-8 kB per multiprocessor (size depends on specific GPU) texture cache. If the word is found in the cache, it is returned immediately, otherwise the texture cache fetches the required words from global memory.

The texture cache also has some other features. It does low-precision interpolation between array elements “for free”, and also understands how to read a specially packed multi-dimensional array format. This why 2D textures must be loaded onto the card using “CUDA Arrays” which are not organized in memory using the normal linear layout. Instead the elements are arranged into something like a Z-curve:

With the exception of 1D textures, the texture cache expects the data layout to be in this special CUDA Array order. The array copy you mention is just rearranging the elements to be in this order. It isn’t actually moving the data to a special segment of “texture memory.”