Unbind and rebind texture

If I need to change the contents of texture memory, should I unbind the texture, copy the new contents to the texture and rebind texture?

Or can I just copy the new contents to the texture?

Thank you,

I am sorry this post did’nt get any answer, I hope I will be more lucky. I have browsed the forum about the texure topic but couldn’t make it clear :

The texture memory is read only. This means it can be modified by a kernel. Now, can we use several texture memories for a same application or is the cache “unique” ?

My concern is that several kernels are successively called by my application and I try to use a different texture for each kernel, bound to a different and newly allocated array each time. It doesn’t seem to work.

Any idea ?

I would recommend shying away from textures unless your kernel is unable to read memory in 64B or 128B blocks. I’ve seen someone try to do benchmarks where they’re doing many reads from a small table using textures (so everything should be cached in L1) but their total throughput was still less than the theoretical max global mem bandwidth. You would think total throughput would be higher in this case since…

For a 8600 GT, 8.64 GT/s = 138.24 GB/s (float4) or 34.56 GB/s (float); global mem = 22.4 GB/s
For a 8800 GTX, 36.80 GT/s = 588.8 GB/s (float4) or 147.20 GB/s (float); global mem = 86.4 GB/s
For a GTX 280, 48.16 GT/s = 770.56 GB/s (float4) or 192.64 GB/s (float); global mem = 141.7 GB/s

but my experience has been pretty much the same. Textures are slower than global mem. Maybe others have had better luck? I would be curious to know what the max theoretical throughput of the L1 or L2 caches are…

To actually try to answer your question ;-), I like to think of a texture as consisting of three parts: 1) a chunk of memory that contains the texture data, 2) a descriptor that tells the sampler the base address, element size, how to interpret each component (float, packed bytes), etc and 3) the sampler itself. You can change the memory (#1) that the texture is bound to either from the host side (using memcpy) or from the device side (by modifying it inside a kernel). You can also change anything about #2 (base address, component format, etc) between kernels calls. Texture caches are flushed before each kernel call and there’s no way to flush them within a kernel call so textures are basically read only or write only for a given call. Synchronization is usually automatic but if you’re using multiple streams you’ll probably have to watch out for that.

Thanks for the answer ! :thumbup:

The reason why I am using texture is that pixel access is awfully random and 2D. No coalescent load is thus impossible, and local memory is to small to store all needed pixels.

According to what you are saying, I should be able to update th texture between the different kernel calls. The way I am performing this for the moment is using cudaarrays :

cudaChannelFormatDesc descSobel = cudaCreateChannelDesc<unsigned char>();

	CUDA_SAFE_CALL(cudaMallocArray(&myarray, &descSobel, iw, ih));

	CUDA_SAFE_CALL(cudaMemcpyToArray(myarray, 0, 0, data, sizeof(unsigned char)*iw*ih, cudaMemcpyDeviceToDevice));

	CUDA_SAFE_CALL(cudaBindTextureToArray(tex, myarray));


// Sobel edge detection

	SobelTex<<<ih, 384>>>(sobelData, iw, iw, ih, fScale,BINDISPLAY_BIN, seuil);

		// Copy edges image to host memory

	CUDA_SAFE_CALL( cudaMemcpy( SobelImage, sobelData, size*sizeof(unsigned char), cudaMemcpyDeviceToHost));



for instance. Input data (“data” variable) changes along the kernel calls. Up to now, the texture takes the correct values at the very first texture definition of the application and then never changes (even though the corresponding global memory is supposed to be destroyed…)