Texture Memory? How do you use it?

achinda99 · December 24, 2009, 1:01am

I’m fairly new to OpenCL so please bare with me.

In the first iteration of my code, I used basic memory buffers for large datasets and declared them global. However now that I’m looking to improve the timing, I wanted to use texture memory for this. In the CUDA version, we use cudaBindTexture and tex1Dfetch to obtain the data for a large 1D float array. From my understanding of the specification, texture memory is the same thing as image memory. However, since there are only 2D and 3D image objects with max heights and widths, I run into some issues. My array larger than max height/width, but not max height * max width. Must I convert my 1D array into 2D? Or is there a better way to do it?

Or am I completely off?

I did read [url=“http://forums.nvidia.com/index.php?showtopic=151743”]http://forums.nvidia.com/index.php?showtopic=151743[/url] and [url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA but they weren’t exactly conclusive in whether the texture memory referred to in Best Practices and Programming Guide was in fact image objects.

Thanks and any help/suggestions are greatly welcome!

jcpalmer · December 27, 2009, 9:07pm

Yes, conversion to 2 or 3D is required. You can just wrap you 1D addresses into 2D. You may wish to write a routine which takes your prior 1D address & converts it to either int2 / int4, or float2 / float4, to keep your main logic readable.

The terms texture and image are basically inter-changeable. Texture implies a small swatch that is used to put a skin over a fragment, in OpenGL terminology. Neither need to actually contain image info.

In performance terms, you really need to be reading more than one value, in multiples of 4, in each work unit to really feel the biggest advantages. Of course, you need to organize your texels, such that data that goes together is in the same texel to get 4x throughput. I am not sure if textures provide more throughput when a work unit writes more than 1 value, but would not be surprised.

Reading the Best Practices Guide, they talk about getting up to 16x with global memory, but your access to data has to be pretty precise. Not only do all problems not fit so neatly to use that, but any resulting kernel is likely to be very NVidia optimized. Not sure this will even do anything on the same NVidia hardware on OSX, as opposed to a more random pattern. Everybody has textures though and they are likely to work the same way, so it might be better to get 4x everywhere, unless you do not care about other implementations.

If you are only reading/writing 1 value, then you are pinning most of your performance hopes on caching. You also get out of the sequential / aligned access restrictions, but this is not the texture sweet spot.

Topic		Replies	Views
Textures CUDA Programming and Performance	2	1631	July 22, 2008
CUDA texture memory performance CUDA Programming and Performance	4	33546	January 13, 2009
binding texture with linear memory CUDA Programming and Performance	1	4292	April 21, 2007
Texture Memory ! CUDA Programming and Performance	3	7164	January 11, 2010
no texture arrays == severely limiting? CUDA Programming and Performance	15	8975	May 31, 2007
Memory performance in image processing example CUDA Programming and Performance	9	1603	March 24, 2011
cuda array CUDA Programming and Performance	7	8873	October 15, 2008
Texture Memory Does Not Improve Speed CUDA Programming and Performance	1	586	February 23, 2017
Textures: linear memory vs cudaArrays CUDA Programming and Performance	9	7786	October 16, 2007
Using Texture Memory for Matrix Data? CUDA Programming and Performance	1	171	March 25, 2024

Texture Memory? How do you use it?

Related topics