# interpolation and fft in labview programming strategy for the dig sig processing in a medical im. se

Hello all,

I am trying to speed up the digital signal processing of an optical coherence tomography setup using CUDA. I am quite new to CUDA,
Basically, a CCD generates a 2D array (about 1024 x 640) of values at about 29Hz. The values in each row (1024 rows of each 512 elements) need to be 1D interpolated and Fourier transformed. So you have 1024 identical series of operations which need to be performed. My questions are:

-Is this a problem on which CUDA could help?
-Should I try to let the GPU calculate the interpolation and FFT row per row or all rows at one time? (General programming strategy)
-Do you know of any decent GPU 1D interpolations?

PS: If this matters, I am trying to implement this in Labview

You can use cufft library (a cuda library with fft support). It comes with the CUDA toolkit.
I think you can launch a grid of size 1024 blocks, with each block being able to perform fft on a 1D array of size 512 elements.

Ok, and for the interpolation part, any thoughts on that?

Sorry, not sure whether there are any libraries for doing interpolation in CUDA…
So, you could launch 2 kernels one after the another:

1. 1024 blocks with threads in each block doing interpolation of a intermediate location in that row.
2. 1024 blocks with threads in each block doing FFT on the interpolated row.

1024 blocks, what kind of graphic card would I need to cover that kind of size/memory?

Does anyone else know of interpolation libraries by any chance?

640 elements in a row => 640 * 8B (assuming double precision) = 5120B = ~5KB.
Since you have an input row, which will be FFT’ed and then interpolated, you will require 5 * 3KB of storage = 15KB of global (as well as shared memory, as i’m pretty sure that cufft must be using shared memory for a better performance.)
For 1024 such rows (or blocks for CUDA), 15 * 1024 = 15MB of global memory.
Now Quadro NVS 290 itself supports 16KB of shared memory per block + 256MB of global memory!!! :)

That said, the choice of gfx cards always boils down to the following 2 factors: your requirements (in terms of speed, memory, perf, etc) + your budget.
Bigger guys go fetch tesla whereas those with mediocre requirements + wallet settle down to geforce 9xxx .

Hope this helps…

Thanks a lot for your help. The only thing left to do then is find more information about 1D interpolation techniques I guess…

Thanks again.

If you use texture memory, you get linear interpolation free for 1D, 2D and 3D. It doesn’t cost more in time to read the value at a floating point position than at an integer position.

Ok it seems like good advice to use the texture memory for interpolation, I read it everywhere. Is there a good SDK example on using texture memory?

And also, how would the memory transfers go, starting from the beginning: I have a camera which is now used in Labview (I guess the frames are loaded on CPU memory?), then I would like to write the function for the linear interpolation on the GPU (so transfer back to GPU?), implement this function in LAbview and then after I want to visualize this on my screen (again to CPU?). Would this transfer of memory work like this, and if so, wouldn’t all these transfers take up too much time? Or is there a way to load the data directly into the texture memory or something from the camera? Lot of questioms I know…

Nvida cuda sdk (with cuda ver >= 2.2) comes with some simple examples of using texture: simpleTexture, simpleTexture3D and simpleTextureDrv. Hope these can be good starting points for you…

Hi Eluri:

Have you solved this problem?

I am learning to integrate GPU into Labview and I just meet similar problems as you. I just don’t know how to use the cufft function. I have no ideal about how much space I have to allocate to such calculation and don’t know how to deal with the imaginary input and output. Could you help me on it? If it is OK, can you give me an example of how to use the CUFFT from Labview?

Thank you very much.