how to use multi component textures? linear or rray mem better for textures?


I would like to use textures for big data arrays with caching. But the simpleTexture example does not show how to use textures with more float components or differently sized integer values (if that is possible?).
I tried to create RGB (three component) texture but texfetch() does not work with float3 as return type.

BTW: Is there some performance difference between linear mem or array mem
concerning textures?

Last question (for this one ;-) : are there the typical 3d-API size limits (4096^2) ?

The hw only support 1-, 2-, and 4-component textures, not 3-component textures.

I’m not 100% sure what you mean by “differently sized integer values”. Each component has the same type (see section of the programming guide for the list of supported types).

There is no performance difference in terms of latency between linear memory and arrays, but arrays have a memory layout that makes them much more suitable to texture fetching.
The main application of texturing over linear memory is to get around the coalescing constraints on loading from global memory; the texture cache might help save some memory bandwidth in some cases too.

The maximum width for a texture is 64K and maximum height 32K.

Does this mean that if I bind linear memory to a 1D texture the maximum amount of the linear memory I can access through the texture is the first 64K elements? I don’t seem to get any sort of error message if I try and bind something larger than 64K to a texture. And it seems that I can access values beyond the first 64K entries.

Or does this limitation only apply to arrays? Or does CUDA wrap 1D into 2D for you?

Sorry for the ambiguity of my previous post: The limits I gave only apply to texture references bound to arrays.

Texture references bound to linear memory can only be 1D and their maximum width is 2^27.

I wondered if it might be possible to use texture elements as structs with four different types or resolution (if normalized to [0,1]). For example the first component may be int and the second only short. But I admit this is a weird idea… :wacko:

Good to hear that there is no 4096(^2) limit like OpenGL in CUDA. So there are 2^27 texture elements (e.g. fp32) and not bytes? So is 2^29 (0.5 GB) the maximum size is bytes?

Yes, it’s 2^27 texture elements. The maximum size in bytes is 2^27 times the maximum element size in bytes.

I’m not sure if this answers your question, but you can pack bits any way you want in linear memory and use it through texture cache. Depending on your layout, this could be a pain in the butt, however, sometimes it’s very simple.

For example, if you have a type

struct MyType {

  int i;

  short s1;

  short s2;


You have a type that is 8 bytes. You can pack this into a linear texture like so:

texture<uint2> myTypeTexture;

Then you can access s2 of the 20th element by saying:

short s2 = texfetch(myTypeTexture, 20).y & 0xffff;

I use this stuff all over the place. The __float_as_int and __int_as_float functions tend to be really useful in this situation

Did i miss some documentation?

Should be mentioned somewhere, that cudaCreateChannelDesc won’t do what the most people might expect.

cudaCreateChannelDesc(8,8,8,0,f) makes subsequent calls that use this channel desc fail, but not with cudaCreateChannelDesc.

It took me the whole day, to discover the problem! :geek: