interpolation and fft in labview programming strategy for the dig sig processing in a medical im. se

Eluri · September 29, 2009, 3:07pm

Hello all,

I am trying to speed up the digital signal processing of an optical coherence tomography setup using CUDA. I am quite new to CUDA,
so your advice would be greatly appreciated.
Basically, a CCD generates a 2D array (about 1024 x 640) of values at about 29Hz. The values in each row (1024 rows of each 512 elements) need to be 1D interpolated and Fourier transformed. So you have 1024 identical series of operations which need to be performed. My questions are:

-Is this a problem on which CUDA could help?
-Should I try to let the GPU calculate the interpolation and FFT row per row or all rows at one time? (General programming strategy)
-Do you know of any decent GPU 1D interpolations?

Thanks in advance!

PS: If this matters, I am trying to implement this in Labview

_teju · September 29, 2009, 5:02pm

You can use cufft library (a cuda library with fft support). It comes with the CUDA toolkit.
I think you can launch a grid of size 1024 blocks, with each block being able to perform fft on a 1D array of size 512 elements.

Eluri · September 29, 2009, 5:11pm

Ok, and for the interpolation part, any thoughts on that?

_teju · September 29, 2009, 5:43pm

Sorry, not sure whether there are any libraries for doing interpolation in CUDA…
So, you could launch 2 kernels one after the another:

1024 blocks with threads in each block doing interpolation of a intermediate location in that row.
1024 blocks with threads in each block doing FFT on the interpolated row.

Eluri · September 29, 2009, 9:15pm

1024 blocks, what kind of graphic card would I need to cover that kind of size/memory?

Does anyone else know of interpolation libraries by any chance?

_teju · September 30, 2009, 5:19am

640 elements in a row => 640 * 8B (assuming double precision) = 5120B = ~5KB.
Since you have an input row, which will be FFT’ed and then interpolated, you will require 5 * 3KB of storage = 15KB of global (as well as shared memory, as i’m pretty sure that cufft must be using shared memory for a better performance.)
For 1024 such rows (or blocks for CUDA), 15 * 1024 = 15MB of global memory.
Now Quadro NVS 290 itself supports 16KB of shared memory per block + 256MB of global memory!!! :)

That said, the choice of gfx cards always boils down to the following 2 factors: your requirements (in terms of speed, memory, perf, etc) + your budget.
Bigger guys go fetch tesla whereas those with mediocre requirements + wallet settle down to geforce 9xxx .

Hope this helps…

Eluri · September 30, 2009, 8:36am

Thanks a lot for your help. The only thing left to do then is find more information about 1D interpolation techniques I guess…

Thanks again.

wanderine · October 1, 2009, 5:44pm

If you use texture memory, you get linear interpolation free for 1D, 2D and 3D. It doesn’t cost more in time to read the value at a floating point position than at an integer position.

Eluri · October 2, 2009, 9:49am

Ok it seems like good advice to use the texture memory for interpolation, I read it everywhere. Is there a good SDK example on using texture memory?

And also, how would the memory transfers go, starting from the beginning: I have a camera which is now used in Labview (I guess the frames are loaded on CPU memory?), then I would like to write the function for the linear interpolation on the GPU (so transfer back to GPU?), implement this function in LAbview and then after I want to visualize this on my screen (again to CPU?). Would this transfer of memory work like this, and if so, wouldn’t all these transfers take up too much time? Or is there a way to load the data directly into the texture memory or something from the camera? Lot of questioms I know…

_teju · October 3, 2009, 4:08am

Nvida cuda sdk (with cuda ver >= 2.2) comes with some simple examples of using texture: simpleTexture, simpleTexture3D and simpleTextureDrv. Hope these can be good starting points for you…

gaisi · December 27, 2010, 5:28am

Hi Eluri:

Have you solved this problem?

I am learning to integrate GPU into Labview and I just meet similar problems as you. I just don’t know how to use the cufft function. I have no ideal about how much space I have to allocate to such calculation and don’t know how to deal with the imaginary input and output. Could you help me on it? If it is OK, can you give me an example of how to use the CUFFT from Labview?

Thank you very much.

Topic		Replies	Views
Some questions about texture memory CUDA Programming and Performance	8	1598	March 5, 2019
basic interpolation kernel should i use shared memory here or not? CUDA Programming and Performance	3	1628	October 24, 2009
Multiple batches of 1D FFT using cuFFT GPU-Accelerated Libraries	10	5047	October 29, 2019
new: cubic interpolation in CUDA cubic B-spline interpolation CUDA Programming and Performance	19	43172	July 20, 2023
FFT Speed vs. x86 CUDA Programming and Performance	14	24716	July 27, 2008
Parallel processing with large arrays CUDA Programming and Performance	9	6250	April 2, 2008
Repeated 1D interpolation with type promotion CUDA Programming and Performance	3	574	October 12, 2021
Re_arranging Cuda array CUDA Programming and Performance	8	40	September 23, 2024
Using tex2D for unsigned short/char CUDA Programming and Performance	14	3668	November 15, 2017
Accuracy of 1D linear interpolation by CUDA texture interpolation CUDA Programming and Performance	25	14616	January 29, 2013

interpolation and fft in labview programming strategy for the dig sig processing in a medical im. se

Related topics