I have a 3d grid of geographic points and I am performing interpolation between these.

My first attempts was using 3D linear filtering, but it turns out that the accuracy suffered too much.

So I have converted my kernel to perfrom 8 texture lookups (corners of the cube) using POINT filtering, and

them I perform the standard bilinear interpolation in the kernel.

Unfortunately this reduces the bandwidth of my kernel from about 700M pixels per second to more like 250 Mpp.

Any thoughts how to better this?

I am unsure whether the computation of the texture lookups is responsible for the slowdown. My guess would be the memory accesses.

thx in advance.