texture memory binding performance

I am implementing a image processing project. It needs a real time processing about 30 frames per second. One frame image is about 2007004bytes.

I bind one frame of image in the global memory to texture memory, it takes about 7ms for binding. 7ms is a long time for real time processing. I don’t know if there is anything wrong? or it’s real that binding a image to texture memory needs to take a such long time.

Any help or suggestion will be helpful, thanks

You don’t need to bind and unbind the texture for each frame. You can just copy new data to it and invoke a new kernel.

Thanks for reply. can you show me an example?

Sorry, I don’t have an example handy. What I meant was that

    copy data to GPU array

    bind texture to GPU array

    call kernel

    unbind texture

    copy data to GPU array

    bind texture to GPU array

    call kernel

    unbind texture

    potentially more iterations…

can safely be replaced by

    copy data to GPU array

    bind texture to GPU array

    call kernel

    copy data to GPU array

    call kernel

    potentially more iterations…

    unbind texture

Do any changes to the GPU array appear immediately in the bound texture?

For example:

copy data to GPU array

bind texture to GPU array

call kernel with arg ptr to GPU array which modifies it

Does the bound texture immediately reflect the modifications within the kernel?

Due to the non-coherent nature of the texture cache, changes to underlying storage in a kernel may or may not be visible when accessing through the texture path in the same kernel. In essence, the behavior is undefined.

Before every kernel launch, the texture cache is flushed, so reading through a texture in a kernel will correctly reflect changes to underlying storage made by a previous kernel, or by a CUDA API call preceeding the kernel. This is the scenario tera was showing in his pseudo code.

[Later:] See section 3.2.10.4 of the CUDA C Programming Guide