What's the instruction throughput for texture fetches? How to refresh texture object without recopying?

xgr_1986 · April 25, 2018, 3:15pm

I have some scattered memory access(actually most can be coalesced), and sometimes linear interpolation is needed, so the texture memory may be a good and easy choice. But some parts of my kernel are not quite bounded by memory throughput, and I’m not sure whether the texture instructions have significant lower instruction throughput than other high throughput instructions (such as FP32 ADD), thus I don’t know how to make a good balance between memory bound parts and instruction bound parts. Can anyone be so kind to provide a reference about the instruction throughput for texture fetch instructions?

Another problem is how to refresh texture contents of texture object without recopying. As I know, after the texture reference is binded to some memory contents, it should not be changed during kernel execution. But for different calls to kernels, the texture will be updated automatically once the binded memory contents have changed. For example, array A binded to TexA, array B binded to TexB, so first kernel call can use TexA to update B, and second kernel call can use TexB to update A. But for texture objects, the contents are copied to CudaArray, which seems not likely to be updated when the original memeory contents are changed. And CudaArray seems not even writable, thus I have to re-memcpy the modified contents to update it. Is it possible to do this without recopying?

Thanks very much~

cbuchner1 · April 25, 2018, 4:04pm

Texture objects support linear memory (1D), pitch linear memory (2D) and cudaArrays (1D, 2D, 3D) as data source.

Only the latter (cudaArray) may require another expicit data copy, as the data needs to be reordered into a space filling curve for optimized access.

Maybe look into cuda surfaces if you need to have both write access and cached read access.

HannesF99 · April 25, 2018, 5:23pm

There is no need to re-copying, just use linear memory for 1D vectors and pitch linear memory for matrices (or images) and bind texture object to them . I am not sure how useful CudaArrays really are for latest GPU architectures. I never used them so far due to the complications (additional copying) as described by you.
It is even possibe to implement certain ‘inplace’ (read/write) functions with texture objects if one takes some care - e.g. for multiplying all values of an array by a constant factor.

For more information regarding usage of texture object search for ‘texture’ on the GTC-on-demand website (Search | NVIDIA On-Demand)
My GTC 2018 presentation with id “S8111” will be also available there in the next few weeks.

njuffa · April 25, 2018, 7:18pm

That would be my recommendation as well. I have never encountered a use case where cudaArrays provided a noticeable benefit (which doesn’t mean there couldn’t be cases where they provide a benefit).

If you use one of the newer architectures (Maxwell, Pascal) it is worth exploring whether using textures provides any performance benefit at all. If you want “free” fp16 → fp32 conversion or can make do with low-precision linear interpolation, the use of textures may still provide benefits.

xgr_1986 · April 26, 2018, 1:54pm

Thanks for cbuchner1, HannesF99 and njuffa ~ I’ll try pitched linear memory~

After checking the api ,‎ only four types are available for texture object data source. Does the cudaResourceTypePitch2D also work for 3D cases?

enum cudaResourceType {
                  cudaResourceTypeArray          = 0x00,
                  cudaResourceTypeMipmappedArray = 0x01,
                  cudaResourceTypeLinear         = 0x02,
                  cudaResourceTypePitch2D        = 0x03
              };

cbuchner1 · April 26, 2018, 3:04pm

I think 3D access requires CUDA arrays (cudaMalloc3DArray(), cudaMemcpy3DParms, cudaMemcpy3D() … ). Would you need trilinear interpolation in the 3D texture? Otherwise you could use a 2D texture, but place the individual layers side by side…

Topic		Replies	Views
CUDA texture object with linear memory seems not to be updated when fetching CUDA Programming and Performance cuda	3	492	June 2, 2024
writing to texture memory CUDA Programming and Performance	7	6852	September 29, 2009
Some questions about texture memory CUDA Programming and Performance	8	1850	March 5, 2019
Question about texture memory CUDA Programming and Performance	3	4541	May 27, 2009
Unbind and rebind texture CUDA Programming and Performance	3	6240	January 15, 2009
Textures CUDA Programming and Performance	2	1746	July 22, 2008
CUDA arrays vs pitch linear texturing CUDA Programming and Performance	0	1461	October 12, 2011
Linear Memory Vs CUDA Array for texture binding CUDA Programming and Performance	2	17758	January 21, 2010
How to use the texture memory ? CUDA Programming and Performance	3	2860	May 4, 2007
Binding linear memory to 2D texture is a copy made? CUDA Programming and Performance	3	1776	November 8, 2009

What's the instruction throughput for texture fetches? How to refresh texture object without recopying?

Related topics