A problem I’m solving requires looking up a value of function F(x,y,z) or G(x,y,z) depending on the value of some variable i (can be 0 or 1). Speaking in pseudo-code, it’d be something like:
v = i == 0 ? F(x,y,z) : G(x,y,z)
The functions F and G will be implemented as interpolated 3D texture lookups, and the textures in question will be ~1G in size (each). From performance and space perspective, would it be better to implement the above as two separate textures (one for F, and one for G)
v = i == 0 ? tex3D(F, x, y, z) : tex3D(G, x, y, z)
or to pack F and G into a single float2 texture (let’s call it FG) and perform the lookup as:
float2 tmp = tex3D(FG, x, y, z) v = i == 0 ? tmp.x : tmp.y
Naively I’d think that the first option would be faster, but looking at CUDA’s .h files (e.g., texture_fetch_functions.h) it seems that internally all texture fetches are returned in float4 variables which got me wondering if there’d be any benefit to the second approach?