(Performance) Two float1 or a single float2 texture?


A problem I’m solving requires looking up a value of function F(x,y,z) or G(x,y,z) depending on the value of some variable i (can be 0 or 1). Speaking in pseudo-code, it’d be something like:

v = i == 0 ? F(x,y,z) : G(x,y,z)

The functions F and G will be implemented as interpolated 3D texture lookups, and the textures in question will be ~1G in size (each). From performance and space perspective, would it be better to implement the above as two separate textures (one for F, and one for G)

v = i == 0 ? tex3D(F, x, y, z) : tex3D(G, x, y, z)

or to pack F and G into a single float2 texture (let’s call it FG) and perform the lookup as:

float2 tmp = tex3D(FG, x, y, z)

v = i == 0 ? tmp.x : tmp.y


Naively I’d think that the first option would be faster, but looking at CUDA’s .h files (e.g., texture_fetch_functions.h) it seems that internally all texture fetches are returned in float4 variables which got me wondering if there’d be any benefit to the second approach?




Basically a float2 or float4 texture fetch would be better than 2 or 4 single float fetches. Going from 2 float fetches to one float2 fetch gave me ~20% performance

boost, but I guess it has to do with the application as well.

In your case I think it would be best NOT to merge the two arrays, first because they are not logicaly related (as I understand) and secondly because

of the size of the combined array (~2GB) - you won’t be able to do this on a non Tesla card + see a bug report I’ve opened -

582591 - Problem accessing textures bound to huge arrays (>2GB) (C1060)

hope this helps,