I have a CUDA kernel program that does a global memory load and does four tex2D calls to different textures. I have read that the compiler can hide loading latency when there are multiple indepedent global memory loads in series. Is this also the case for multiple tex2D calls in series?
Thanks,
Aaron