I am implementing a image processing project. It needs a real time processing about 30 frames per second. One frame image is about 2007004bytes.
I bind one frame of image in the global memory to texture memory, it takes about 7ms for binding. 7ms is a long time for real time processing. I don’t know if there is anything wrong? or it’s real that binding a image to texture memory needs to take a such long time.
Due to the non-coherent nature of the texture cache, changes to underlying storage in a kernel may or may not be visible when accessing through the texture path in the same kernel. In essence, the behavior is undefined.
Before every kernel launch, the texture cache is flushed, so reading through a texture in a kernel will correctly reflect changes to underlying storage made by a previous kernel, or by a CUDA API call preceeding the kernel. This is the scenario tera was showing in his pseudo code.
[Later:] See section 3.2.10.4 of the CUDA C Programming Guide