Applying an element-wise 1D multiplication to the result of a cufft fft

Essentially I am attempting to perform an element-wise (complex by complex) 1D multiplication on the result of two separate ffts(using cufft). The complex results of both ffts are of size (N/2)+1, and I naively applied the element wise multiplication to those complex results and then took the ifft of that back to the real space.(Essentially implementing the convolution in frequency space)

Of course I scale the end result based on the N (for real), but this result is way off (when compared to the respective MATLAB implementation).

If I do not do the element-wise complex-complex multiply and just forward fft, inverse fft and then scale, I get back the correct original real input data.

So I am wondering if there is an additional scaling step before/after the element wise complex multiply which is throwing off my result?

This may be an obvious question, but my background is more in discrete math, and I do no have the experience with all the nuance of ffts. There is not much documentation in Cufft about scaling and padding, and it seems to leave this to the user.

There is a cuda sample which does (I think) exactly this, which may be of interest:

I’m not an expert, but it seems that your general method is correct: transform both signals to be convolved, then do element wise multiplication, then inverse transform the result.

With CUFFT, there is a scaling that is needed to get the original data back from a FFT->IFFT sequence, but it seems you have already accounted for that.

Do your results appear to be off merely by a scaling factor (when compared with the matlab result)?

Thanks. That means there is a good chance I may be doing something incorrect in the element-wise multiplication. These are batched 1D ffts forward, multiply and back.

That referenced example uses complex-to-complex forward and back, while in my case it is real to complex and back.

Because the result of the forward fft is of size (N/2)+1, my uncertainly is about whether or not an element-wise multiplication using that (N/2)+1 complex representation is enough to create a result in the complex form which can be inverse fft(d) back to real form with the correct answer of N size.

In MATLAB the size of an fft(in terms of the number of elements) stays the same through both steps, so comparing the output to that of the cufft library has been tricky.

One of the challenges with batched FFTs may be getting your data layout correct. I think if you validate your code simply by doing FFT->IFFT you can have a misconception about data layout that will not trip up the validation. But if you then try to do an operation in the frequency domain, you may find that things go amiss.

A few suggestions:

  1. see if you can get things to work correctly (your convolution with R2C and C2R) using just a single plan/dataset rather than a batched plan.
  2. validate the results of your R2C operation (i.e. forward FFT) with matlab (for the non-redundant section of the result data).
  3. carefully validate your understanding of data layout parameters, and if you are using advanced data layout, perhaps review some examples of errors that can be made, e.g.:

Thanks for the help, but I still do have some unresolved naive questions about this process.


real array size N

const real filter array size N*2

temp real array size N*2

temp complex array size ((N*2)/2+1)


copy real values size N to temp array size N*2(padded with zeros on end N portion)

allocate temp complex array for fft result of size ( (N*2)/2 +1)

create fft plan R2C using size N*2

perform forward fft on temp real array size (N2) which has complex result ( (N2)/2 +1)

somehow apply real filter by complex element wise multiplications on that ( (N2)/2 +1) complex fft result(I have been applying the first ( (N2)/2 +1) values of the real filter element wise against the respective complex values)

create ifft plan C2R using size N*2

perform ifft to from complex array size (N2)/2 +1) back to temp real array size N2

chop off last N values of that real result and apply scaling by (N*2) to the N values

copy back those N values


somwhere in this proces I am going wrong, but not sure where. I am getting no error codes so this has to do with some other incorrect assumption I have made.

So my main questions relate to how I do apply an element wise multiplication of size n to a complex array in the cufft data form of n/2 +1? I know how to multiply real and complex numbers, but that temp storage format is throwing me off.

If I do have a n size real array and need to apply this element-wise mult against those n/2+1 values, do I just use the first n/2 +1 real values of the real array against the stored complex values?

Can you give more information on what you want to achieve? Is it just computing the convolution?
Then your first idea should work: FFT^-1( FFT(g(x))FFT(h(x)))
The product FFT(g(x))FFT(h(x)) should have all the properties that are needed for a c2r transform.

Or do you want to do something else? Because in your last post you speak of a multiplication of a real with a complex array…

Agreed. In my first posting, as well as the sample code I linked, it’s evident that it is necessary to “transform both signals to be convolved”.

If you are not doing this, I think it’s a problem. If you are doing this, both signals starting in the real domain having the same length, should result in corresponding signals in the complex domain that are also the same length. And you would do a complexcomplex multiply, not a realcomplex multiply.

Sorry if my statements were confusing.

I have two implementations of a fft convolution in MATLAB which I am adapting using cufft.

The first implemenation of the implementations does exactly this: FFT^-1( FFT(g(x))FFT(h(x)))

The other MATLAB implementation actually creates a real vector and then in a loop multiplies that real vector times the complex result of a fft then takes the ifft of that result(real).

I am guessing that MATLAB knows how to handle that real-complex multiplication while I am not sure hot to replicate that process using the different sized cufft complex fft result and the real filter vector.

Was pulling my hair out over the first implementation because I thought that should work as well, but just found out that the MATLAB implementation was incorrect and I wasted all that time( he used incorrect padding). Since I mainly work in discrete math this all was very confusing, but at least I know that the error was not from my code.

Can anyone recommend some link which has a some clear examples of how to padd for ffts? This aspect seems like a ‘black art’ to me.


I know it’s been a few years since your post. What exactly did you find here regarding the scaling? I’m new to frequency domain and finding exactly what you found - FFT^-1[FFT(x) * FFT(y)] is not what I expected but FFT^-1[FFT(x)]/N = x but scaling by 1/N after the fft-based convolution does not give me the same result as if I’d done the convolution in time domain.

Any suggestions based on what you found? link might be useful. In its implementation, it does not do the normalization(scaling by the reciprocal of the size of the data). You simply adapt the code in this link with the normalization.