Signal Processing

We record raw signal data up to say 30-60 minutes. Although we have live signal filters, after the recording is done we can view the data under more sophisticated (non-real-time) filters. However, the filtering can take minutes (currently all CPU). I’d like to see if we can use CUDA. We basically have some low pass, high pass, and notch filters.

Our filter algorithms currently depend on filtering previous results. So we filter as time progresses. This makes it difficult to parallelize because we can’t just divide the data up and assign a thread block to work on each partition because in order to process some thread block, it requires that the previous samples were already filtered. This is partly because the algorithms update values as it processes samples, and these updates values are used to filter subsequent samples.

I’m not a signal processing expert. But I’m wondering if this problem can be transformed. High/low/notch filters are basically removing different frequencies. It seems we can do FFT (fast on GPU), apply filters in frequency space, then do inverse FFT. I’m assuming it would be easy to filter the desired frequencies in the frequency space, and the algorithm wouldn’t depend on “previous” sample results being filtered.

Hoping someone with signal processing experience and CUDA can provide advice.


An FFT implies block wise processing of input samples, unless you run a single big FFT over the entire signal (which may not be possible if you need to adapt your filters as you go).

The FFTs and their inverses are neatly parallelizeable, even though you still have to step through the blocks sequentially as you stated that each block’s filtering parameters may depend on the outcome of previous blocks.

The bigger the FFT window is, the slower the rate at which you could adjust your filter parameters. You may have to look for a balance between resolution in the frequency domain and the rate at which you can adapt your filters.

As an alternative to CUDA, consider multithreading / AVX optimizing your CPU based filters to maximize the speed. Also a hybrid approach may be feasible where CUDA just does the FFT/IFFT and the CPU does the filtering in the frequency domain. Then basically all you need is to use the CUFFT libraries in batched mode.

Thanks for the reply. I was hoping for a signal recording (where I have the entire signal up front) it would be possible to filter frequencies without having to adjust the filtering parameters sequentially, but I don’t know enough about signal processing. I’m not really even sure the point of adjusting the parameters on the fly as we get more signal data. The filter must adapt and learn or something. But if we have the entire signal up front I’m not sure how important an adaptive algorithm would be. I will talk to some of our signal engineers and see.

A hybrid approach might work well. The signal data would be large enough to justify doing a GPU FFT and downloading the result for further CPU processing.