no texture arrays == severely limiting?

Based on several previous posts, it is not possible to have an array of textures that kernel functions can index into. My application needs to convolve an image with thousands of different, small patches. From what I can tell, my options are:

(1) Don’t use textures at all. Undesirable, as textures are very, very useful for this task, and other than this one oversight, very well supported.

(2) Join all the patches into one big texture. Not possible, due to texture size limitations.

(3) Use branching in the kernel code. Not really practical, as this would incur a massive performance hit due to divergent warps.

(4) A variant of #2 that makes multiple, sequential kernel invocations, each time stuffing as many patches into a single texture as the size limitations allow. This is probably what I’ll have to do, barring…

(5) Something fantastically clever that one of you thinks of.

(6) Wait for NVIDIA to remove this limitation.

I’d be thrilled if anybody could fill in #5! Otherwise, would NVIDIA care to comment on a timeframe for #6?


We are aware of this limitation.

The problem here is that the hardware can’t dynamically index into an array of regular texture references (samplers). I believe this is why the CUDA compiler doesn’t currently support arrays of texture references.

If the texture sampler can be determined at compile time (by unrolling loops, for example), it should be possible.

DirectX 10-class hardware does support a new type of texture known as texture arrays, which are essentially like 3D textures but with no filtering between layers:…xture_array.txt

It’s possible these could be incorporated into a future version of CUDA but I don’t think we have any immediate plans for this.

If you don’t need mipmaps, could you use a 3D texture, with each image in a different slice?


Thanks for replying so quickly. I’m not a graphics guy – hence the appeal of CUDA – so let me see if I understand what you’re saying…

  • An array of references to different texture objects isn’t possible due to hardware limitations, but a single reference (to a 3D texture) would be do-able in the current hardware.

  • The GeForce 8800 GTX is a DirectX 10 card, so already has 3D textures, but they’re just not accessible from CUDA right now.

Being able to access a 3D texture like this would completely solve my problem. I don’t need the individual 2D textures to have different sizes, properties, etc. (All I need is the standard texture caching, optimized for 2D access.) Not sure if that’s the case for those who have previously posted on this topic.


Are you sure? From the CUDA FAQ

That is 32 GByte for a float4 texture. (The beta has a limit that will go away soon) So before you reach this limit, you will be out of GPU mem anyway and you will need a paging strategy no matter which texturing approach you use.


Unfiltered 3-D texture support (or, like stated, only filtered in 2-D) would be great for this kind of cases, I hope CUDA will one day support all the texture types OpenGL and DX do.

I must have misread the FAQ. If so, sorry everybody. A really big 2D texture would do just fine. (Although a 3D one would still be better. If I’m reading from the right edge of one of my tiled 2D textures, I don’t want it to be buffering the left edge of the next tile over, which is actually an unrelated texture.)

Here’s what the FAQ says. Actually it still looks confusing to me.

What is the current 2D limit? When will it go away?

What is the difference between a “1D texture” (8,192 elements max) and a “1D buffer texture” (134,217,728 elements max) ?


As the FAQ says, the memory layout. Linear mem is allocated with cudaMalloc, tiled mem is allocated with cudaMalloc2D.


In other words, 1D textures support texture filtering and normalized texture coordinates (with the various addressing modes), but “1D buffer texture” don’t.


In case it’s helpful, my workaround for the array problem is to use a switch statement, sort of like this:

case 8:
do something with tex8
case 7:
do something with tex7

case 1:
do something with tex1

That works fine for iterating through all the textures, if there aren’t a huge number of them. I don’t think it’s the way to go if you have thousands.


That wouldn’t be a problem anyway because on a D3D10 class device like G80 there usually are at most only 16 active textures (i.e. samplers).


The FAQ doesn’t explain what a “buffer” texture is. I don’t think the programming guide does either. I’m just confused about the terminology.

So, “buffer” = linear memory, otherwise = an opaque array object, I presume.

The manual explains in great depth the difference between a linear buffer and array memory including striding, padding, etc


Please – I appreciate the help (in the 1st reply), but enough with the “RTFM” type comments. I get it now. I was only having a problem with the terminology. I don’t know how to make that more clear.

If you really care, please show me where in this manual the word “buffer” appears, outside of the OpenGL / DirectX interoperability sections.…g_Guide_0.8.pdf



I agree the documentation isn’t very clear, we’ll try and clean this up.

A lot of us here are graphics programmers, and use terminology from the graphics APIs without thinking. Texture fetches from linear memory in CUDA basically correspond to this functionality in OpenGL:…ffer_object.txt


I think the key here is not the “buffer” but the “linear memory”:

But why don’t you use the newer CUDA Programming Guide 0.8.2?



If you have done any graphics stuff before, you should be aware that “buffer” is commonly used for a variety of storage. If you haven’t done graphics programming before, I can only point you to any DirectX/OpenGL introductory book. Every buffer has a specific memory layout that is most suitable for its application. This is usually the reason why you cannot simply use a framebuffer as a texture or vertex array. The graphics API commands take care of converting the memory layout and thus usually incur a copy operation when reassigning a buffer for another usage.

The same applies to CUDA buffers. As CUDA is tuned to computing applications, the standard memory layout is linear as one experiences with PC main memory. For special access methods (cached 2D tiles) as texturing offers, memory is organized differently. This leads to varying max limits.

I will not write a complete memory organization book here, so please do RTFM, Google for texture storage concepts or take a look at Cirrus’ texengine patent. I know that it is tempting to simply put any small problem showing up with a new technology to the forum, but please forgive me and the many other people here that voluntarily answer questions if they don’t have time to write complete tutorials, especially for questions in the “advanced development” forum.