cufftDx - inverse FFT behave like forward FFT

Hi,
I’m using cufftDx in order to perform convolution.
I got some non-reasonable results so I tried to figure out where does the problem come from.
I commented out part of the code, simplify the process and found the problem:
I have a data vector of 1024 complex floating point elements.
I filled the vector with the same number -40 + 0j so I have 1024 elements of the same complex number.
After executing FFT with cufftDx, I printed the data and found it’s all zeros except the first element: -40960 + 0j, as expected.
But, after executing IFFT, I printed the data and found it’s all the same number - -40960 + 0j instead of being -40 + 0j as expected.
This is the result of double FFT instead of FFT and IFFT.
What can I do? am I missing something?

Thank you in advance,
Ori

My code look like:

// Host Code:

static constexpr unsigned int fft_size1      = 1024; 

using FFT_base     = decltype(Block() + Size<fft_size1>() + Type<fft_type::c2c>() + Precision<float>() +
		                              ElementsPerThread<2>() + FFTsPerBlock<1>() + SM<750>());
using FFT          = decltype(FFT_base() + Direction<fft_direction::forward>());
using IFFT         = decltype(FFT_base() + Direction<fft_direction::inverse>());

cudaFuncSetAttribute(
	my_kernel<FFT, IFFT>,
	cudaFuncAttributeMaxDynamicSharedMemorySize,
	FFT::shared_memory_size );

    my_kernel<FFT, IFFT><<<GridSizeKernel, FFT::block_dim, FFT::shared_memory_size >>>(data);

// Device Code:

template<class FFT, class IFFT>
__launch_bounds__(FFT::max_threads_per_block)
 __global__ my_kernel(complex *data){

   // load data to shared memory

   FFT().execute(shared_mem);
   __syncthreads();
   // printing data
   IFFT().execute(shared_mem);
   __syncthreads();
   // printing data
}




Now I see, maybe the IFFT of cufftdx is defined without the 1/N factor?

Correct. cuFFT doesn’t normalize FFTs. That is up to the users.