I was looking at the convolutionTexture example and I noticed that a device to device memory copy was done after the convolutionRowGPU call. Basically it was copying the result of convolutionRowGPU back to the input array that the input texture was initially bound to.
Is there a reason why you wouldn’t just re-bind the texture to the temporary array instead? Do you take a large performance hit when you re-bind a texture to another memory space, or was this done for the sake of simplicity? At the very least I would think that you could use a separate texture object for the temporary array and have the convolutionColGPU use that texture as input instead. Does the number of texture objects used impact performance as well?