How better can we utilize the texture memory to accelerate CUDA programs? Which aspect of texture memory is worth studying for further? How to use the texture cache more effectively?
I test a program (a simple image blending program written by myself) implemented by texture memory and shared memory, and then compare the execute time. I find that texture is 3 times faster than shared memory. However, I always think that shared memory is faster.
Let’s start a discussion!
Thanks for participating!
:smile: