new: cubic interpolation in CUDA cubic B-spline interpolation

dannyruijters · October 8, 2008, 5:39pm

I have created some code to perform cubic B-spline interpolation in CUDA. This code allows you to replace linear 2D and 3D texture filtering with cubic interpolation.

I have also included prefiltering to convert data samples into B-spline coefficients and several example programs + code.
The CUDA version is 327 times faster than a non-optimized CPU implementation on my PC! :)
interested? download the code from: CUDA cubic interpolation

regards,
Danny

edit: The website address has changed (my previous provider went bankrupt).

kristleifur · October 8, 2008, 5:58pm

Nice! Is it similar to the process in GPU Gems?

dannyruijters · October 9, 2008, 7:05am

Yes it is, except that I do not create a lookup table, but do all calculations on the fly.

That leads to a higher accuracy, while it hardly costs any extra processing time.

cheers,

Danny

kristleifur · October 9, 2008, 1:56pm

V. cool!

I briefly attempted to get my head around that algorithm but failed. It’ll be fun to have a look at your code.

Simon_Green · October 14, 2008, 4:52pm

You may not have seen it, but the latest CUDA SDK includes a bicubic filtering sample that implements the method in GPU Gems.

This is usually faster since it only requires 4 bilinear texture lookups for a 2D bicubic filter (instead of the usual 16).

kristleifur · October 15, 2008, 3:40pm

Oh! Nice, it pays to look at the new features.

dannyruijters · October 15, 2008, 4:38pm

I must admit that I also completely missed this example. When I started with my version, I used the SDK 2 beta, and there the example was not present yet, and when I later upgraded, I did not check the examples well enough :">

Since the SDK example also contains a benchmarking mode I quickly tried with both the SDK cubic interpolation and my code (it was very easy to integrate that one):

regular 2D linear interpolation: 1214 Mpixels/sec
SDK 2D cubic interpolation: 1198 Mpixels/sec
my 2D cubic interpolation: 1205 Mpixels/sec :)

It is striking that cubic interpolation (both versions) is hardly slower than linear interpolation. I guess that this is due to caching.

Of course my coding effort was not completely in vain, since I also offer 3D cubic interpolation and the Thevanaz prefilter.

Without the prefilter cubic interpolation has a smoothing effect (you can actually see that when you look closely at the SDK example output).

cheers,

Danny

Simon_Green · October 16, 2008, 10:51am

Your numbers prompted me to look at this again, and it turns out there’s a big problem with the sample code - it doesn’t get memory coalescing on the writes (because it writes to a uchar array). Oops.

I fixed this by changing it to a uchar4 array, this will be in the next release. It just goes to show it’s always a good idea to profile your code!

dannyruijters · October 16, 2008, 2:05pm

I did the same thing, and these are the numbers I get on my GeForce 9800 GTX:

regular 2D linear interpolation: 4560 Mpixels/sec
SDK 2D cubic interpolation: 1995 Mpixels/sec
my 2D cubic interpolation: 2057 Mpixels/sec

Now there is a more sensible difference between cubic and linear interpolation. I guess that the fact that the performance gapp is less than a factor four is due to texture caching.

cheers,

Danny

Sarnath · October 17, 2008, 6:18am

Using -O2 option for NVCC usually triples your CPU performance. Check that out.

It is always to good to profile against O2 optimized CPU code.

Nonetheless, 327X looks rocking good! 327/3 as well… :) good luck!

jma · October 17, 2008, 11:34am

IIRC there is only one bilinear unit for every two “normal” execution units, which would explain the missing 2x.

dannyruijters · October 18, 2008, 9:10am

Straight-forwardly thinking you would need two texture fetch units for every processing unit to explain the missing factor two (so exactly the opposite), otherwise the processing units are just waiting for the texture fetch to finish.

Anyway, the real situation is a bit more complex: there are two streaming multiprocessor units (SM), a L1 texture cache and a texture unit on every texture processor cluster (TPC) for the GeForce 8800 architecture. Every SM possesses eight streaming processors (SP). See e.g. this powerpoint.

cheers, Danny

nitin.life · November 3, 2008, 3:41am

Hey guys… its interesting to go through this thread. I am a new programmer for CUDA. I have a 1D cubic interpolation code . Can that also be seeded upto 100 times ? I haven’t thought about it much as i am still learning cuda, but was interested in knowing if thats is actually possible.

Thanks all… :)

Nittin

dannyruijters · November 5, 2008, 8:40pm

I would expect that for 1D interpolation the speedup is less, since you would benefit less from smart data rehashing, which is done by the GPU for 2D and 3D textures. However, I have not tried 1D, so why don’t you give it a try and let us know…

kind regards,

Danny

nitin.life · November 5, 2008, 9:29pm

Yes I certainly would, and will get back to you guys here , as soon as I have something running.

Thanks all,

Nittn

dannyruijters · April 8, 2010, 8:17pm

The cubic interpolation has been extended to efficiently deal with RGBA color data.
An example that performs on-the-fly cubic filtering on AVI playback illustrates this.

Also makefiles have been added for compilation on the Mac and Linux.
see here

cheers!
Danny

coop · April 28, 2010, 3:29pm

I have a 3D texture of water velocity data, where many grid points are land (i.e., null values). I would like to use CUDA and the cubic interpolation package to interpolate velocity at specific locations, but ignore the land data points.

Is this generally possible using the built-in interpolation hardware of GPU textures, or for the cubic interpolation package? I can set the land values to whatever I prefer, but not sure how to ignore them in the calculations.

Any suggestions would be much appreciated.

johnfrcg01 · July 18, 2023, 12:27pm

Hello I am new to Cuda and I found this thread useful since it has cubic interpolation, does anyone know how to test the code. I am not too familiar with texture. What type of arguments are taken into a texture structure. This will be really appreciated. thank you so much…

Robert_Crovella · July 18, 2023, 1:45pm

There are cuda sample codes that demonstrate texture usage. The programming guide has various topics concerning textures. There are also blogs that discuss various aspects of texture usage.

On a CUDA GPU the texture unit is a hardware unit that is principally a spatially-optimized cache. It requires explicit programming, both in host code and device code, to make use of it. In addition, for certain use cases, the texture unit can also do certain forms of interpolation. Depending on the use case, people may use the texture unit for one or both purposes: as a cache, or as a cache+interpolator. I don’t happen to know which is being referred to in this 13-15 year old thread or what exactly is in use in the linked repository.

spraesi · July 20, 2023, 9:51am

I want to potentially warn against using textures. It is not mentioned in the programming guide table of instruction throughputs, but in my (limited) experience using the texture units leads to poor performance.

These slides show the reason: The texture unit can only generate 4 outputs per cycle!!

Unless all 32 threads in your warp access the same address/position in your texture, just implement linear/cubic interpolation using normal floating point operations and linear arrays.

Note: For spatially optimizing the cache, simply use 32x4 rectangular thread blocks or similar to localize your 2D accesses.

Topic		Replies	Views
Reading R8G8B8A8 texture using tex2D() causes strange result. CUDA Programming and Performance	27	2869	April 28, 2018
Repeated 1D interpolation with type promotion CUDA Programming and Performance	3	576	October 12, 2021
Accuracy of 1D linear interpolation by CUDA texture interpolation CUDA Programming and Performance	25	14626	January 29, 2013
CUDA vs DX execution times DX GPGPU code --> CUDA = slower CUDA Programming and Performance	15	13320	January 30, 2008
What are you guys doing with cuda? just wanna find a way to go CUDA Programming and Performance	81	56090	February 7, 2013
What can't you do in CUDA that you'd like? Requests for the future CUDA Programming and Performance	407	134569	May 26, 2010
Using tex2D for unsigned short/char CUDA Programming and Performance	14	3678	November 15, 2017
How would you do this? CUDA Programming and Performance	12	4466	August 5, 2008
Linear interpolation with integer texture. CUDA Programming and Performance	6	2744	August 12, 2022
Is GPU worth it? GPU currently too slow. CUDA Programming and Performance	16	6039	December 8, 2008

new: cubic interpolation in CUDA cubic B-spline interpolation

Related topics