CUFFT of a convolution filter is wrong compared to opencv dft function

I have problem in CUFFT of Gaussian low-pass filter and the first derivative filter [1; -1] for FFT-based convolution.
I’m using naive 2D (double-complex) to (double-complex) FFT transform without the texture memory in the sample code of cuda toolkit.
However, the FFT result of CUFFT is different to that of opencv ‘dft’ function as shown in figures below.

I tested the attached code on Ubuntu 20.04, CUDA/NVCC 10.1, and OpenCV 4.5.2. demo_cufft_conv_filter.zip (224.0 KB)

For a Gaussian low-pass filter,

For a the first derivative function [1; -1],

Why I got a different result?
Please let me know why my code gives different FFT results.

Thanks.