Perf w/ 2 textures inside same kernel


Is there any penalty if I use two 2D textures of different types (say uchar1 and uchar4) in the same kernel? Is the cache going to handle both of them? The uchar1 would require 1 or 4 bytes in the cache? (or it depends on the order in which I read them?)

//Input data texture reference
texture<uchar4, 2, cudaReadModeElementType> tex4Data;
texture<uchar1, 2, cudaReadModeElementType> tex1Data;

uchar4 r4uc0, r4uc1;

r4uc0 = tex2D(tex4Data, oidx.x, oidx.y+i);
r4uc1 = tex2D(tex4Data, oidx.x+1, oidx.y+i);
r1uc0 = tex2D(tex1Data, ridx.x, ridx.y+i);
r1uc1 = tex2D(tex1Data, ridx.x+1, ridx.y+i);
r1uc2 = tex2D(tex1Data, ridx.x+2, ridx.y+i);
r1uc3 = tex2D(tex1Data, ridx.x+3, ridx.y+i);


As long as you have 2D locality across threads (more important than within a thread), you should get cache benefits, even with multiple textures.

Graphics shaders use multiple textures all the time; the hardware is designed for this.