About the bilinear interpolation

In the bilinear interpolation, in order to calculate one point, the four neighbor points should be used.

If don’t use the shared memory each point will access global memory for 4 times. In fact some points can be shared.

Can I tile the terminal image and load the source image’s points that will be used from global memory into shared memory, and then calculate by accessing the points loaded in shared memory?

Yes, that’s about the perfect usage for shared memory. - There is some good general info about tiling and padding in the ConvolutionSeparable SDK example + pdf, even though it is more complex than bilinear interpolation.

Can you post your code for the bilinear interpolation please.