Passing texture<...> to a kernel

Hi all,

Is there any way to pass a texture type, or textureReference into a kernel function? I have one kernel function that takes two buffers (same size) as input. Each time, it updates the first buffer in-place, and every second time I call the function it swaps the buffer pointers.

__global__ MyKernel(float* bufA, float* bufB)


  get linear index -> idx

  bufA[idx] = somefunctionof(bufA, bufB);


If I was just using global memory, this code on the host would work:

float *bufA, *bufB;

for( int iter=0; iter<10; iter++ )


   MyKernel<<<grid,threads>>>(bufA, bufB);

   swap(bufA, bufB);


Instead, I’m using texture memory, and therefore have declared two global texture<> variables:

texture<float, 1, cudaReadModeElementType> aTex;

texture<float, 1, cudaReadModeElementType> bTex;

and bound them to the respective buffers in the host code above, before the for loop.

For reads from bufA or bufB, MyKernel now uses tex1Dfetch on aTex and bTex (though for writes, it still does bufA[ind] = blah)… therefore swapping the pointers in the host code has no effect (since aTex and bTex always refer to bufA and bufB).

Solution 1.

Code two MyKernels, one that uses texA/texB, the other identical, but swaps texA for texB. <- ugly from a code maintenance standpoint

Solution 2.

One MyKernel, with an if statement <- no thanks

Solution 3.

Re-bind the textures during every iteration of the for loop <- performance impact unknown

Solution 4.

(Hopeful) Pass the texture<…> variables in as parameters to MyKernel (doesn’t appear to work) … or, obtain a runtime textureReference* to the textures, and pass those in.

It doesn’t appear that tex1Dfetch supports Solution 4 though…

Has anyone dealt with this before? Any solutions?

Thanks :)


I allocate and populate a number of different arrays on the device at the start of my application. Then I re-bind a global texture reference to whichever array is need for the kernel call I am about to make. I have not observed any significant slow-down due to this, although I haven’t really gone looking very closely. Easiest thing is probably to benchmark the cudaBindTexture() function and see. As far as I know, that only is the standard way to switch texture references.

Another option is to pack everything into one large texture, and then pass into the kernel an integer offset to work from. bufA would be offset 0 and bufB would be offset len(bufA).

As a final, semi-ugly option, you could effectively combine solutions 1 and 3. Write one kernel using an if statement, but make the if statement depend on a template parameter. Then you can instantiate your kernel call twice with two different template arguments to get both versions of the kernel. nvcc will optimize away the unused branch as dead code. This will at least let you share more code between the two functions.

template<int aflag>

__global__ void MyKernel()


   if (aflag) {

      // use bufA

   } else {

      // use bufB



But definitely try timing cudaBindTexture compared to your entire kernel before trying a more difficult solution.

In my testing, the cost of rebinding a texture costs a constant additional 40 microseconds.