I am trying to write an image processing library with CUDA. When is it best to use texture memory as opposed to global memory when writing image filters (gaussian sobel, etc)? I know that texture memory is cached, so should i always be using texture memory? Why doesn’t the convolutionSeperable sample use texture memory?
If I was to use texture memory for all my images (8bit int mono, 16bit int mono, 32bit float mono), how would I chain filters without having to recopy the data? For example I wish to compute a gradient image and have it stay resident on the device and then use it as an input to another filter (no point in copying it back to sys mem only to copy back to the device again?). The gradient image would have to be generated into global memory, so how would I treat that same image memory as texture memory in the next pass? Would I have to do a device to device transfer to accomplish this? Is that slow? I used to do this all the time in direct3d without having to do the copy (at least not explicitly), so would this device to device transfer int CUDA cause my chaining to run slower than direct3d?