no texture arrays == severely limiting?

jmutch · May 28, 2007, 6:47pm

Based on several previous posts, it is not possible to have an array of textures that kernel functions can index into. My application needs to convolve an image with thousands of different, small patches. From what I can tell, my options are:

(1) Don’t use textures at all. Undesirable, as textures are very, very useful for this task, and other than this one oversight, very well supported.

(2) Join all the patches into one big texture. Not possible, due to texture size limitations.

(3) Use branching in the kernel code. Not really practical, as this would incur a massive performance hit due to divergent warps.

(4) A variant of #2 that makes multiple, sequential kernel invocations, each time stuffing as many patches into a single texture as the size limitations allow. This is probably what I’ll have to do, barring…

(5) Something fantastically clever that one of you thinks of.

(6) Wait for NVIDIA to remove this limitation.

I’d be thrilled if anybody could fill in #5! Otherwise, would NVIDIA care to comment on a timeframe for #6?

Jim

Simon_Green · May 28, 2007, 8:21pm

We are aware of this limitation.

The problem here is that the hardware can’t dynamically index into an array of regular texture references (samplers). I believe this is why the CUDA compiler doesn’t currently support arrays of texture references.

If the texture sampler can be determined at compile time (by unrolling loops, for example), it should be possible.

DirectX 10-class hardware does support a new type of texture known as texture arrays, which are essentially like 3D textures but with no filtering between layers:
[url=“http://developer.download.nvidia.com/opengl/specs/GL_EXT_texture_array.txt”]http://developer.download.nvidia.com/openg...xture_array.txt[/url]

It’s possible these could be incorporated into a future version of CUDA but I don’t think we have any immediate plans for this.

If you don’t need mipmaps, could you use a 3D texture, with each image in a different slice?

jmutch · May 28, 2007, 8:52pm

Simon,

Thanks for replying so quickly. I’m not a graphics guy – hence the appeal of CUDA – so let me see if I understand what you’re saying…

An array of references to different texture objects isn’t possible due to hardware limitations, but a single reference (to a 3D texture) would be do-able in the current hardware.
The GeForce 8800 GTX is a DirectX 10 card, so already has 3D textures, but they’re just not accessible from CUDA right now.

Being able to access a 3D texture like this would completely solve my problem. I don’t need the individual 2D textures to have different sizes, properties, etc. (All I need is the standard texture caching, optimized for 2D access.) Not sure if that’s the case for those who have previously posted on this topic.

Jim

prkipfer · May 29, 2007, 1:04pm

Are you sure? From the CUDA FAQ

That is 32 GByte for a float4 texture. (The beta has a limit that will go away soon) So before you reach this limit, you will be out of GPU mem anyway and you will need a paging strategy no matter which texturing approach you use.

Peter

wumpus · May 29, 2007, 1:14pm

Unfiltered 3-D texture support (or, like stated, only filtered in 2-D) would be great for this kind of cases, I hope CUDA will one day support all the texture types OpenGL and DX do.

jmutch · May 29, 2007, 9:17pm

I must have misread the FAQ. If so, sorry everybody. A really big 2D texture would do just fine. (Although a 3D one would still be better. If I’m reading from the right edge of one of my tiled 2D textures, I don’t want it to be buffering the left edge of the next tile over, which is actually an unrelated texture.)

Here’s what the FAQ says. Actually it still looks confusing to me.

What is the current 2D limit? When will it go away?

What is the difference between a “1D texture” (8,192 elements max) and a “1D buffer texture” (134,217,728 elements max) ?

Jim

prkipfer · May 30, 2007, 9:16am

As the FAQ says, the memory layout. Linear mem is allocated with cudaMalloc, tiled mem is allocated with cudaMalloc2D.

Peter

Cyril_Zeller · May 30, 2007, 12:26pm

In other words, 1D textures support texture filtering and normalized texture coordinates (with the various addressing modes), but “1D buffer texture” don’t.

Cyril

bshucker · May 30, 2007, 12:37pm

In case it’s helpful, my workaround for the array problem is to use a switch statement, sort of like this:

switch(num_textures)
{
case 8:
do something with tex8
case 7:
do something with tex7
…
case 1:
do something with tex1
}

That works fine for iterating through all the textures, if there aren’t a huge number of them. I don’t think it’s the way to go if you have thousands.

Brian

pyrtsa · May 30, 2007, 1:07pm

That wouldn’t be a problem anyway because on a D3D10 class device like G80 there usually are at most only 16 active textures (i.e. samplers).

/Pyry

jmutch · May 30, 2007, 7:45pm

The FAQ doesn’t explain what a “buffer” texture is. I don’t think the programming guide does either. I’m just confused about the terminology.

So, “buffer” = linear memory, otherwise = an opaque array object, I presume.

prkipfer · May 31, 2007, 9:57am

The manual explains in great depth the difference between a linear buffer and array memory including striding, padding, etc

Peter

jmutch · May 31, 2007, 11:29am

Please – I appreciate the help (in the 1st reply), but enough with the “RTFM” type comments. I get it now. I was only having a problem with the terminology. I don’t know how to make that more clear.

If you really care, please show me where in this manual the word “buffer” appears, outside of the OpenGL / DirectX interoperability sections.

http://developer.download.nvidia.com/compu…g_Guide_0.8.pdf

External Media

Jim

Simon_Green · May 31, 2007, 11:39am

I agree the documentation isn’t very clear, we’ll try and clean this up.

A lot of us here are graphics programmers, and use terminology from the graphics APIs without thinking. Texture fetches from linear memory in CUDA basically correspond to this functionality in OpenGL:
[url=“http://developer.download.nvidia.com/opengl/specs/GL_EXT_texture_buffer_object.txt”]http://developer.download.nvidia.com/openg...ffer_object.txt[/url]

pyrtsa · May 31, 2007, 11:43am

Hi,

I think the key here is not the “buffer” but the “linear memory”:

But why don’t you use the newer CUDA Programming Guide 0.8.2?

Sincerely,

/Pyry

prkipfer · May 31, 2007, 12:42pm

If you have done any graphics stuff before, you should be aware that “buffer” is commonly used for a variety of storage. If you haven’t done graphics programming before, I can only point you to any DirectX/OpenGL introductory book. Every buffer has a specific memory layout that is most suitable for its application. This is usually the reason why you cannot simply use a framebuffer as a texture or vertex array. The graphics API commands take care of converting the memory layout and thus usually incur a copy operation when reassigning a buffer for another usage.

The same applies to CUDA buffers. As CUDA is tuned to computing applications, the standard memory layout is linear as one experiences with PC main memory. For special access methods (cached 2D tiles) as texturing offers, memory is organized differently. This leads to varying max limits.

I will not write a complete memory organization book here, so please do RTFM, Google for texture storage concepts or take a look at Cirrus’ texengine patent. I know that it is tempting to simply put any small problem showing up with a new technology to the forum, but please forgive me and the many other people here that voluntarily answer questions if they don’t have time to write complete tutorials, especially for questions in the “advanced development” forum.

Peter

Topic		Replies	Views
An array of texture references? CUDA Programming and Performance	30	29722	October 29, 2007
Textures CUDA Programming and Performance	2	1631	July 22, 2008
CUDA texture memory performance CUDA Programming and Performance	4	33544	January 13, 2009
Device memory size CUDA Programming and Performance	11	46824	June 6, 2008
CUDA vs DX execution times DX GPGPU code --> CUDA = slower CUDA Programming and Performance	15	13319	January 30, 2008
Texture / Array Access CUDA Programming and Performance	20	21450	April 19, 2008
Texture Memory? How do you use it? CUDA Programming and Performance	1	5836	December 27, 2009
Array of texture references CUDA Programming and Performance	8	8354	April 16, 2009
Constant Arrays CUDA Programming and Performance	13	30346	November 24, 2007
Constant memory per multi processor CUDA Programming and Performance	17	8876	September 24, 2007

no texture arrays == severely limiting?

Related topics