A problem I’m solving requires looking up a value of function F(x,y,z) or G(x,y,z) depending on the value of some variable i (can be 0 or 1). Speaking in pseudo-code, it’d be something like:
v = i == 0 ? F(x,y,z) : G(x,y,z)
The functions F and G will be implemented as interpolated 3D texture lookups, and the textures in question will be ~1G in size (each). From performance and space perspective, would it be better to implement the above as two separate textures (one for F, and one for G)
v = i == 0 ? tex3D(F, x, y, z) : tex3D(G, x, y, z)
or to pack F and G into a single float2 texture (let’s call it FG) and perform the lookup as:
float2 tmp = tex3D(FG, x, y, z)
v = i == 0 ? tmp.x : tmp.y
?
Naively I’d think that the first option would be faster, but looking at CUDA’s .h files (e.g., texture_fetch_functions.h) it seems that internally all texture fetches are returned in float4 variables which got me wondering if there’d be any benefit to the second approach?
Basically a float2 or float4 texture fetch would be better than 2 or 4 single float fetches. Going from 2 float fetches to one float2 fetch gave me ~20% performance
boost, but I guess it has to do with the application as well.
In your case I think it would be best NOT to merge the two arrays, first because they are not logicaly related (as I understand) and secondly because
of the size of the combined array (~2GB) - you won’t be able to do this on a non Tesla card + see a bug report I’ve opened -
582591 - Problem accessing textures bound to huge arrays (>2GB) (C1060)