I am trying to speed up the digital signal processing of an optical coherence tomography setup using CUDA. I am quite new to CUDA,
so your advice would be greatly appreciated.
Basically, a CCD generates a 2D array (about 1024 x 640) of values at about 29Hz. The values in each row (1024 rows of each 512 elements) need to be 1D interpolated and Fourier transformed. So you have 1024 identical series of operations which need to be performed. My questions are:
-Is this a problem on which CUDA could help?
-Should I try to let the GPU calculate the interpolation and FFT row per row or all rows at one time? (General programming strategy)
-Do you know of any decent GPU 1D interpolations?
Thanks in advance!
PS: If this matters, I am trying to implement this in Labview
You can use cufft library (a cuda library with fft support). It comes with the CUDA toolkit.
I think you can launch a grid of size 1024 blocks, with each block being able to perform fft on a 1D array of size 512 elements.
640 elements in a row => 640 * 8B (assuming double precision) = 5120B = ~5KB.
Since you have an input row, which will be FFT’ed and then interpolated, you will require 5 * 3KB of storage = 15KB of global (as well as shared memory, as i’m pretty sure that cufft must be using shared memory for a better performance.)
For 1024 such rows (or blocks for CUDA), 15 * 1024 = 15MB of global memory.
Now Quadro NVS 290 itself supports 16KB of shared memory per block + 256MB of global memory!!! :)
That said, the choice of gfx cards always boils down to the following 2 factors: your requirements (in terms of speed, memory, perf, etc) + your budget.
Bigger guys go fetch tesla whereas those with mediocre requirements + wallet settle down to geforce 9xxx .
If you use texture memory, you get linear interpolation free for 1D, 2D and 3D. It doesn’t cost more in time to read the value at a floating point position than at an integer position.
Ok it seems like good advice to use the texture memory for interpolation, I read it everywhere. Is there a good SDK example on using texture memory?
And also, how would the memory transfers go, starting from the beginning: I have a camera which is now used in Labview (I guess the frames are loaded on CPU memory?), then I would like to write the function for the linear interpolation on the GPU (so transfer back to GPU?), implement this function in LAbview and then after I want to visualize this on my screen (again to CPU?). Would this transfer of memory work like this, and if so, wouldn’t all these transfers take up too much time? Or is there a way to load the data directly into the texture memory or something from the camera? Lot of questioms I know…
Nvida cuda sdk (with cuda ver >= 2.2) comes with some simple examples of using texture: simpleTexture, simpleTexture3D and simpleTextureDrv. Hope these can be good starting points for you…
I am learning to integrate GPU into Labview and I just meet similar problems as you. I just don’t know how to use the cufft function. I have no ideal about how much space I have to allocate to such calculation and don’t know how to deal with the imaginary input and output. Could you help me on it? If it is OK, can you give me an example of how to use the CUFFT from Labview?