Bilinear interpolation precision issues and implementing low precision interpolation in software

I’m trying to implement the 4 taps Catmull-Rom upsampling technique described in this post. However, when looking closely, I’m seeing some blockiness in the final output, which the author may have not noticed.
I’ve verified my implementation multiple times, and I’m fairly sure no errors have been introduced on my end. Furthermore, if I replace the hardware bilinear interpolation with a software equivalent, the blockiness goes away. I think what I’m seeing is due to the 9-bit representation of the interpolation weight in the texture unit (as stated in section J.2 of the Cuda Developer Guide and in various posts on this forum.

I would like to get rid of said blockiness and get a clean output.

One obvious solution would be to do the interpolation in shader, but that will introduce significant overhead in some of the low spec machines I’m targeting.

Interestingly, I noticed that if I use a texture format with an alpha channel (set to 1.0 and -1.0 in a checkerboard pattern as described in the post from above), and then normalize the final color output by the alpha value, the result, while being not exactly “accurate”, does not show any blockiness and is a definite improvement. However, I’d like to use this technique on R11G11B10 targets, for which that solution would not be possible. I could create a small 2x2 single channel texture and sample it to retrieve the normalization factor, but again, that would have a performance penalty higher than I’d like.

Is there any way to get a full precision bilinear interpolation from the hardware, instead of the 9-bit version? (I’m guessing there isn’t)

Alternatively, since I already know what the values in the alpha channel will look like (1.0 and -1.0 in a checker board pattern) is there a way to implement the low precision hardware interpolation in software. That way I could compute the alpha normalization value directly without having to sample it. This is my current attempt, which lessens the blockiness a bit, but does not get rid of it, unfortunately.

vec2 px_pos = floor( uv * texSize - 0.5 ) + 0.5;
vec4 tx_smp = texture( tex, uv, 0.0 );
vec2 wr     = round( w * 256.0 );
vec2 wrr    = 256.0 - wr;
float sfp   = ( int( px_pos.x ) & 1 ) ==
              ( int( px_pos.y ) & 1 ) ? 1.0 : -1.0;
tx_smp.a    = ( wrr.x ) * ( wrr.y ) *  1.0 * sfp +
              ( wr.x  ) * ( wrr.y ) * -1.0 * sfp +
              ( wrr.x ) * ( wr.y  ) * -1.0 * sfp +
              ( wr.x  ) * ( wr.y  ) *  1.0 * sfp;
tx_smp.a   /= 256.0 * 256.0;

(sfp is just a sign flip factor dependent on the position within the checkerboard patter)

Any help would be greatly appreciated. Thanks in advance!

Hello @StrG30 and welcome to the NVIDIA developer forums!

Very interesting ideas! Do you plan to use this as some form of real-time upscaler for a game, or is this simply for academic purposes?

I personally can’t help answer your questions with respect to the HW interpolation alternatives, but I could imagine others here to be curious about how to solve this.