Based on several previous posts, it is not possible to have an array of textures that kernel functions can index into. My application needs to convolve an image with thousands of different, small patches. From what I can tell, my options are:
(1) Don’t use textures at all. Undesirable, as textures are very, very useful for this task, and other than this one oversight, very well supported.
(2) Join all the patches into one big texture. Not possible, due to texture size limitations.
(3) Use branching in the kernel code. Not really practical, as this would incur a massive performance hit due to divergent warps.
(4) A variant of #2 that makes multiple, sequential kernel invocations, each time stuffing as many patches into a single texture as the size limitations allow. This is probably what I’ll have to do, barring…
(5) Something fantastically clever that one of you thinks of.
(6) Wait for NVIDIA to remove this limitation.
I’d be thrilled if anybody could fill in #5! Otherwise, would NVIDIA care to comment on a timeframe for #6?
The problem here is that the hardware can’t dynamically index into an array of regular texture references (samplers). I believe this is why the CUDA compiler doesn’t currently support arrays of texture references.
If the texture sampler can be determined at compile time (by unrolling loops, for example), it should be possible.
Thanks for replying so quickly. I’m not a graphics guy – hence the appeal of CUDA – so let me see if I understand what you’re saying…
An array of references to different texture objects isn’t possible due to hardware limitations, but a single reference (to a 3D texture) would be do-able in the current hardware.
The GeForce 8800 GTX is a DirectX 10 card, so already has 3D textures, but they’re just not accessible from CUDA right now.
Being able to access a 3D texture like this would completely solve my problem. I don’t need the individual 2D textures to have different sizes, properties, etc. (All I need is the standard texture caching, optimized for 2D access.) Not sure if that’s the case for those who have previously posted on this topic.
That is 32 GByte for a float4 texture. (The beta has a limit that will go away soon) So before you reach this limit, you will be out of GPU mem anyway and you will need a paging strategy no matter which texturing approach you use.
Unfiltered 3-D texture support (or, like stated, only filtered in 2-D) would be great for this kind of cases, I hope CUDA will one day support all the texture types OpenGL and DX do.
I must have misread the FAQ. If so, sorry everybody. A really big 2D texture would do just fine. (Although a 3D one would still be better. If I’m reading from the right edge of one of my tiled 2D textures, I don’t want it to be buffering the left edge of the next tile over, which is actually an unrelated texture.)
Here’s what the FAQ says. Actually it still looks confusing to me.
What is the current 2D limit? When will it go away?
What is the difference between a “1D texture” (8,192 elements max) and a “1D buffer texture” (134,217,728 elements max) ?
In other words, 1D textures support texture filtering and normalized texture coordinates (with the various addressing modes), but “1D buffer texture” don’t.
Please – I appreciate the help (in the 1st reply), but enough with the “RTFM” type comments. I get it now. I was only having a problem with the terminology. I don’t know how to make that more clear.
If you really care, please show me where in this manual the word “buffer” appears, outside of the OpenGL / DirectX interoperability sections.
If you have done any graphics stuff before, you should be aware that “buffer” is commonly used for a variety of storage. If you haven’t done graphics programming before, I can only point you to any DirectX/OpenGL introductory book. Every buffer has a specific memory layout that is most suitable for its application. This is usually the reason why you cannot simply use a framebuffer as a texture or vertex array. The graphics API commands take care of converting the memory layout and thus usually incur a copy operation when reassigning a buffer for another usage.
The same applies to CUDA buffers. As CUDA is tuned to computing applications, the standard memory layout is linear as one experiences with PC main memory. For special access methods (cached 2D tiles) as texturing offers, memory is organized differently. This leads to varying max limits.
I will not write a complete memory organization book here, so please do RTFM, Google for texture storage concepts or take a look at Cirrus’ texengine patent. I know that it is tempting to simply put any small problem showing up with a new technology to the forum, but please forgive me and the many other people here that voluntarily answer questions if they don’t have time to write complete tutorials, especially for questions in the “advanced development” forum.