cudafft: real to complex Hermitian redundancy of real FFT?

I am not sure it is correct or not, or caused by some other reasons.

I filtered some real signals by FFT. and plus them. (0.9 support real FFT)
I did the same thing with the intel mkl FFT.

It seems that the result from cudaFFT contains some low-frequency artifacts.

I am not sure why, I guess that the cudaFFT C2R part does not consider the “Hermitian” redundancy, so the minus frequency part is ignored.

Any comment, please.

CUFFT returns only the non-redundant coefficients.
Can you be more specific about the transform you are doing? Is it 1D, 2D? What is the transform size? What is the dynamic range of your input signal?

What I did is filtering many 1D 2048 signals. I list a sample code

cufftHandle cudaFFT_Plan_R2C_Once = 0; // for filter kernel

cufftHandle cudaFFT_Plan_R2C = 0;          // for forward transform

cufftHandle cudaFFT_Plan_C2R = 0;          // for backward transform

int filterWidth = 2048;

int NumOfTransform = 480;

cufftComplex * filter;    // the filter kernel on the device

cufftComplex * data;    // data on the device

int width = 640            // the signal length

float * realSignal          //real signal on host (= new float[width*NumOfTransform])

//prepare the device memory

cudaMalloc((void **) &filter, sizeof(cufftComplex)*(filterWidth/2 + 1) );

cudaMalloc((void **) &data, sizeof(cufftComplex)*(filterWidth/2 + 1)*NumOfTransform);

cudaMemset(data, 0, sizeof(cufftComplex)*(filterWidth/2 + 1)*NumOfTransform);

//prepare the filter kernel

... ...

//Transform the filter kernel

cufftPlan1d(&cudaFFT_Plan_R2C_Once, filterWidth, CUFFT_R2C, 1);

cufftExecR2C(cudaFFT_Plan_R2C_Once, (cufftReal *)filter, filter);

//normalize the filter FFT coefs here

... ...

//prepare the cuFFT Handle

cufftPlan1d(&cudaFFT_Plan_R2C, filterWidth, CUFFT_R2C, NumOfTransform);

cufftPlan1d(&cudaFFT_Plan_C2R, filterWidth, CUFFT_C2R, NumOfTransform);

//prepare the real signal

cudaMemcpy2D( ((float *) data) + (filterWidth / 2) - (width / 2), sizeof(cufftComplex)*(filterWidth /2 + 1), realSignal, width*sizeof(float), width*sizeof(float), NumOfTransform, cudaMemcpyHostToDevice);

//forward transform

cufftExecR2C(cudaFFT_Plan_R2C, (cufftReal *)data, data);

//filtration by complex multify

Filtration_Complex_Point_Wise_Mul_Cuda_Kernel<<< __dim_Grid__, __dimBlock__ >>>(data, filter, (filterWidth /2 + 1), NumOfTransform);

//backward transform

cufftExecC2R(cudaFFT_Plan_C2R, data, (cufftReal *)data);

What I did were just some simple filtration of real signal by a real kernel.

My question is about

cufftExecC2R

I wonder that the minus frequency part is ignored by this command.

Thank you.