I have created some code to perform cubic B-spline interpolation in CUDA. This code allows you to replace linear 2D and 3D texture filtering with cubic interpolation.
I have also included prefiltering to convert data samples into B-spline coefficients and several example programs + code.
The CUDA version is 327 times faster than a non-optimized CPU implementation on my PC! :)
interested? download the code from: CUDA cubic interpolation
edit: The website address has changed (my previous provider went bankrupt).
I must admit that I also completely missed this example. When I started with my version, I used the SDK 2 beta, and there the example was not present yet, and when I later upgraded, I did not check the examples well enough :">
Since the SDK example also contains a benchmarking mode I quickly tried with both the SDK cubic interpolation and my code (it was very easy to integrate that one):
regular 2D linear interpolation: 1214 Mpixels/sec
SDK 2D cubic interpolation: 1198 Mpixels/sec
my 2D cubic interpolation: 1205 Mpixels/sec :)
It is striking that cubic interpolation (both versions) is hardly slower than linear interpolation. I guess that this is due to caching.
Of course my coding effort was not completely in vain, since I also offer 3D cubic interpolation and the Thevanaz prefilter.
Without the prefilter cubic interpolation has a smoothing effect (you can actually see that when you look closely at the SDK example output).
Straight-forwardly thinking you would need two texture fetch units for every processing unit to explain the missing factor two (so exactly the opposite), otherwise the processing units are just waiting for the texture fetch to finish.
Anyway, the real situation is a bit more complex: there are two streaming multiprocessor units (SM), a L1 texture cache and a texture unit on every texture processor cluster (TPC) for the GeForce 8800 architecture. Every SM possesses eight streaming processors (SP). See e.g. this powerpoint.
Hey guys… its interesting to go through this thread. I am a new programmer for CUDA. I have a 1D cubic interpolation code . Can that also be seeded upto 100 times ? I haven’t thought about it much as i am still learning cuda, but was interested in knowing if thats is actually possible.
I would expect that for 1D interpolation the speedup is less, since you would benefit less from smart data rehashing, which is done by the GPU for 2D and 3D textures. However, I have not tried 1D, so why don’t you give it a try and let us know…
I have a 3D texture of water velocity data, where many grid points are land (i.e., null values). I would like to use CUDA and the cubic interpolation package to interpolate velocity at specific locations, but ignore the land data points.
Is this generally possible using the built-in interpolation hardware of GPU textures, or for the cubic interpolation package? I can set the land values to whatever I prefer, but not sure how to ignore them in the calculations.