Doubts in texture fetches How to update textures in-place ?


I am experimenting with Texture fetches and trying to write program utilizing texture reads. I have few small doubts;

1- With reference to the following kernelwhy a temporary variable x is used here? Cant we do d_out[idx]=tex1D(texRef, float(idx)); ?

global void readTexels(int n, float *d_out)


int idx = blockIdx.x*blockDim.x + threadIdx.x;

if(idx < n) {

//Note: Appendix D.2 gives formula for interpolation

float x = tex1D(texRef, float(idx));

d_out[idx] = x;


2- With reference to program shown below, the author says that this program binds a 1D texture to a linear memory. The texture is used to fetch floating-point values from the linear memory and the texture is then updated in-place. The results are then brought back to the host and checked for correctness.

After going through this code I am unable to find “how the texture is updated in-place”, If we look at the

kernel<<<nBlocks, NUM_THREADS>>>(N, d_a); it is clear that it is passing d_a which has some data as the following line shows

cudaMemcpy( d_a, data, memSize, cudaMemcpyHostToDevice );

Now when the kernel is called the location d_a is passed which already has some data. Inside the kernel the statement

d_out[idx] = -tex1Dfetch(texRef, idx);

will overwrite d_out and NOT the texture!

Would anybody please help me in understanding this point.

#include <stdio.h> 

#include <assert.h>

void checkCUDAError(const char *msg)

 { cudaError_t err = cudaGetLastError(); 

if( cudaSuccess != err)


fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString( err) ); 




texture<float, 1, cudaReadModeElementType> texRef;

__global__ void kernel(int n, float *d_out) { int idx = blockIdx.x*blockDim.x + threadIdx.x;

if(idx < n)


 d_out[idx] = -tex1Dfetch(texRef, idx); 



#define NUM_THREADS 256

int main()


int N = 2560;

int nBlocks = N/NUM_THREADS + ((N % NUM_THREADS)?1:0); 

int memSize = N*sizeof(float); 

// data fill array with increasing values 

float *data; data = (float*) malloc(memSize); 

for (int i = 0; i < N; i++) data[i] = float(i); 

float *d_a; cudaMalloc( (void **) &d_a, memSize ); 

cudaMemcpy( d_a, data, memSize, cudaMemcpyHostToDevice ); 



kernel<<<nBlocks, NUM_THREADS>>>(N, d_a);

float *h_out = (float*)malloc(memSize); 

cudaMemcpy(h_out, d_a, memSize, cudaMemcpyDeviceToHost); 


for (int i = 0; i <<N; i++)


assert(data[i] == -h_out[i]); 





free(h_out); free(data);


Textures are read only within the scope of a single kernel lanuch, but d_out is the source data bound to the texture, so it is updating the texture (if another kernel subsequently reads from the same texture).

Now this makes sense to me… :-).

Thanks !

But I am still not able to understand why that temp variable x is used (my first question)…

In such a simple example, the use or non-use of x makes no difference. The compiler will generate an unnamed temporary taking the role of x if you only write d_out[idx] = tex1Dfetch(…).

I myself generally always write thins like “float x = tex1Dfetch(…)”, then work on x and write the result out. I like how it explicitly makes clear what lines of code result in global memory accesses. In more complicated code, x may be used more than once and you definitely want to do that by just reusing x and not placing tex1Dfetch in each location.