CUFFT not a power of two element

Hi all, i’m new in cuda programming, i need to use CUFFT v 2.3 with number of points that are not a power of two (e.g 240). I need to pad the input array? If yes how?

this is my code :

[codebox]#include <stdio.h>

#include <math.h>

#include <cuda.h>

#include <cuda_runtime.h>

#include <cufft.h>

#define NX 240

#define NY 240

int main(int argc, char *argv)


    cufftHandle plan;

    cufftDoubleComplex *devPtr;

    cufftDoubleComplex data[NX*NY];

    cufftDoubleComplex** dataK;

    int i;

    /* source odata creation */

    for(i=  0 ; i < NX*NY ; i++){

            data[i].x =  1;

            data[i].y = 1;


/* GPU memory allocation */

    cudaMalloc((void**)&devPtr, sizeof(cufftDoubleComplex)*NX*NY);

/* transfer to GPU memory */

    cudaMemcpy(devPtr, data, sizeof(cufftDoubleComplex)*NX*NY, cudaMemcpyHostToDevice);

/* creates 1D FFT plan */

    cufftPlan2d(&plan, NX,NY, CUFFT_Z2Z);

/* executes FFT processes */

    cufftExecZ2Z(plan, (cufftDoubleComplex *)devPtr, (cufftDoubleComplex *)devPtr, CUFFT_FORWARD);

/* transfer results from GPU memory */

    cudaMemcpy(data,(cufftDoubleComplex *)devPtr, sizeof(cufftDoubleComplex)*NX*NY, cudaMemcpyDeviceToHost);

/* deletes CUFFT plan */


    /* frees GPU memory */


for(i = 0 ; i < 10 ; i++){

            printf("data[%d] %f %f\n", i, data[i].x, data[i].y);


return 0;



With this input i’ve this output :

[codebox]data[0] 57600.000000 57600.000000

data[1] 0.000000 -0.000000

data[2] -0.000000 -0.000000

data[3] -0.000000 -0.000000

data[4] -0.000000 -0.000000

data[5] -0.000000 -0.000000

data[6] 0.000000 -0.000000

data[7] -0.000000 -0.000000

data[8] -0.000000 -0.000000

data[9] -0.000000 -0.000000


with NX=NY=128 i got this :

[codebox]data[0] 16384.000000 16384.000000

data[1] 0.000000 0.000000

data[2] 0.000000 0.000000

data[3] 0.000000 0.000000

data[4] 0.000000 0.000000

data[5] 0.000000 0.000000

data[6] 0.000000 0.000000

data[7] 0.000000 0.000000

data[8] 0.000000 0.000000

data[9] 0.000000 0.000000


And it seams to work


You don’t need to pad the array, CUFFT has no restrictions on N.
The power of 2 transform (256) will be faster than 240 (3516) but the result will be correct in both cases.

I ask this because in my Fortran program I’ve replaced the Fortran FFT routines with the corresponding CUFFT, but the results aren’t the same. Since you tell me that it doesn’t need N to be a power of 2, the problem must be something else. I’ve no idea of what it could be, though.

Is my implementation correct? If i’ve understood correctly i can use same function without reguard of the input size being a power of 2 or not.

Thanks yet again.

If you are calling from Fortran, remember that CUFFT is expecting row-major order.
So, if your Fortran array is a(NX,NY) when you set up the 2D plan, the call should be:
cufftPlan2d(&plan, NY,NX, CUFFT_Z2Z);

I’ve modified cufftPlan2d like you said but i still get the same results as before.

Perhaps this happens because NX and NY have the same value?

I am still puzzled by the huge difference between the parameters taken by the

Fortran FFT calls and the Cuda ones.

What is wrong with the results?

You are transforming a constant signal, so the zero wave-number (the only one that should have non zero coefficients) contains the sum of the signal.

When NX=NY=240, NXNY=57600, for NX=NY=128, NXNY=16384.

These are the numbers reported in the output you posted.

I have some volumes of the size 160 x 64 x 224, is 3D FFT for 256 x 64 x 256 faster than for 160 x 64 x 224? Does the order of the sizes matter, i.e. is for example 3D FFT of 64 x 160 x 224 faster?