How to apply Scikit CUDA 2D FFT(PyCUDA)

I’m trying to apply a simple 2D FFT over an array image. So, this is my code

import numpy as np 
import cv2
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
from scikits.cuda.fft import fft, Plan

def get_cpu_fft(img):
    return np.fft.fft2(img)

def get_gpu_fft(img):
    shape = img.shape
    img_gpu = gpuarray.to_gpu(img.astype(np.float64))
    out_gpu = gpuarray.empty(shape, np.complex128)
    plan = Plan(shape, np.float64, np.complex128)
    fft(img_gpu, out_gpu, plan)
    return out_gpu.get()

def img_info(img):
    print img.shape
    print img.dtype

def main():

    img = cv2.imread("holo.tif")

    cpu = get_cpu_fft(img)
    gpu = get_gpu_fft(img)

    img_info(cpu)
    img_info(gpu)

    print np.allclose(cpu, gpu, atol=1e-6)

if __name__ == "__main__":
    exit(main())

But the code gives me the following result:

python testdft.py 
(512, 512, 3)
complex128
(512, 512, 3)
complex128
False

Then, I’m not getting the correct 2D Gpu FFT. Am I setting up the Plan correctly? How should I use the “scikits.cuda.fft” for 2D FFTs?

Thank you.

The problem here is because of the difference between np.fft and scikit fft. The difference is that for real input np.fft returns N coefficients while scikits-cuda’s fft returns N//2+1 coefficients.
You may check out that tutorial which explains how to use it and make it compatible to np.fft idtools.com.au/gpu-accelerated-fft-compatible-numpy/