What is the data rate of the serial link you are using? A standard PC UART only goes to 14.4kB/s, which is such a modest data rate that just about anything should be able to compute the FFT in “real time”.
Yes, the data rate is around 14kbps. Could you please tell me how should I proceed. Basically I want to know if it is possible to read the data directly from the GPU, or shall I first get the data in main Memory, and then use cudaMemcpy(, HostToDevice) to get this data in Global memory and then perform the FFT.
It depends on how big of an FFT he wants to do. The time complexity of an FFT is NlogN, while the data transfer complexity is N. There is some (possibly very large) value of N for which it will take longer to perform the FFT than it takes to copy the data. But yeah, in this case I would bet that you are right.
Thanks for the valuable information. Well, I just want to know if suppose I perform the FFT (on the data that I am getting from serial port) on CPU and then on GPU using CUDA, will I not get any speed up. I am not able to understand why not, because FFT is a very computationally expensive task, and by applying CUDA it should accelerate the processes.
Getting back to this older post. I want to know If after acquiring data at serial port can I find the FFT using CUFFT library? At present I am not looking for much speed up, but just want to know if CUFFT can help me in this problem" The problem is:
1-Aquire data at serial port using microcontroller,
If you are receiving data and transmitting results using a standard UART, it probably won’t matter what you use to compute the FFT, because the serial link will be the bottleneck. FTTW will probably work out to be simpler and faster than using cudaFFT in an application like that. It is extremely fast on CPUs with SSE/SSE2 support (which basically means any mainstream CPU manufactured in the last decade).
Given that you are sending the FFT data back to the microcontroller, another option is to drop the CPU entirely and go to a microcontroller with DSP features. (Wikipedia tells me that some vendors call these “digital signal controllers”.) Many of these should come with optimized FFT libraries that can easily keep up with your data rate as well.
But suppose I am NOT sending the data back to the microcontroller, then? (lets not discuss for the time being what I am going to do with the data on in my PC that I get after processing ) I think I must get speed up as compared to FFTW. (the thing is I HAVE to do the processing on GPU, and thus MUST use CUDA or optimized library for GPUs)
That just doesn’t follow. 10 seconds with google produces this, which seems to disprove it. There is even links to code you could use or adapt to test your hypothesis yourself on your own hardware. It seems to be a couple of years old, so maybe cuFFT performance has improved relative to FFTW since it was written, but I doubt the conclusions will be very different.
So I take it this entire thread is some sort of gedankenexperiment where you have committed yourself to doing gpu based project (school I guess) and are now looking for a problem to work on?
I am assuming the same thing. And here is my advice. If it is a college class you are working on, first talk to the professor. Tell them that you do not think that using CUDA will give you a speed up. You could even do it. Then compare the CUDA version to a CPU version, and explain why the CUDA version is slower. One aspect of parallel computing is determining when something can or cannot be parallelized. Also, later on you can easily change the code if you have a different method of transferring the data other than through a serial port.
If you need to write a paper on this, you still could. There is plenty of things you can write about to show that you still learned something. But most importantly talk to your professor. If this is for a work project, you probably shouldn’t be doing it.
But to answer your question. You will first send the signal through the serial port. Store the data using CPU/memory and then you will have to use the cudaMemcpy to get the data onto the GPU.