I have some scattered memory access(actually most can be coalesced), and sometimes linear interpolation is needed, so the texture memory may be a good and easy choice. But some parts of my kernel are not quite bounded by memory throughput, and I’m not sure whether the texture instructions have significant lower instruction throughput than other high throughput instructions (such as FP32 ADD), thus I don’t know how to make a good balance between memory bound parts and instruction bound parts. Can anyone be so kind to provide a reference about the instruction throughput for texture fetch instructions?
Another problem is how to refresh texture contents of texture object without recopying. As I know, after the texture reference is binded to some memory contents, it should not be changed during kernel execution. But for different calls to kernels, the texture will be updated automatically once the binded memory contents have changed. For example, array A binded to TexA, array B binded to TexB, so first kernel call can use TexA to update B, and second kernel call can use TexB to update A. But for texture objects, the contents are copied to CudaArray, which seems not likely to be updated when the original memeory contents are changed. And CudaArray seems not even writable, thus I have to re-memcpy the modified contents to update it. Is it possible to do this without recopying?
Thanks very much~