RT video processing: Use texture fetches or not? question about using tecture cache


we are planning to use CUDA to speed up real time video processing.
If video images are HD, each frame is about 1.5 MB.

I am wondering what is the setup time for texture operations, if we consider making each frame’s data a texture, and use texture fetches instead of global reads?

If texture setup time is large, them it might be good to do plain global memory reads instead. Any experiences on setup times of large textures?



Binding a texture takes only about ~4-10 microseconds extra (it’s been a while since I did that benchmark, I don’t remember the exact number), so it really isn’t much of an overhead.

Using a texture read in the kernel can increase register usage by 2 or 3 which may affect the performance of your code. So, if you can use fully coalesced global memory reads, do so. But if you access memory in a slightly random pattern with 1D or 2D locality, the benefits of the textures will more than pay for the costs.


thanks for the reply.

I am planning to do a 3x3 median filtering for a 4 channel bitmap for now.

I think I am going to need to read 9 32 bit integers from memory for each target pixel. As there is so much locality in this operation, I think binding to a texture is worth it.

Best Regards,


It would be better to use shared memory to reduce the number of reads even further. One of the SDK examples shows how to write a filter using shared memory, I believe.

But the images are really big, about 1.5 MB. There is not so much shared memory available.

Or do you suggest that you read a chunk (e.g. 32x32 piece) first to the shared memory, and access the individual pixels from there?

I will look into the example.