CUFFT not a power of two element

Fr0stY · February 23, 2010, 1:48pm

Hi all, i’m new in cuda programming, i need to use CUFFT v 2.3 with number of points that are not a power of two (e.g 240). I need to pad the input array? If yes how?

this is my code :

[codebox]#include <stdio.h>

#include <math.h>

#include <cuda.h>

#include <cuda_runtime.h>

#include <cufft.h>

#define NX 240

#define NY 240

int main(int argc, char *argv)

{

    cufftHandle plan;

    cufftDoubleComplex *devPtr;

    cufftDoubleComplex data[NX*NY];

    cufftDoubleComplex** dataK;

    int i;

    /* source odata creation */

    for(i=  0 ; i < NX*NY ; i++){

            data[i].x =  1;

            data[i].y = 1;

    }

/* GPU memory allocation */

    cudaMalloc((void**)&devPtr, sizeof(cufftDoubleComplex)*NX*NY);

/* transfer to GPU memory */

    cudaMemcpy(devPtr, data, sizeof(cufftDoubleComplex)*NX*NY, cudaMemcpyHostToDevice);

/* creates 1D FFT plan */

    cufftPlan2d(&plan, NX,NY, CUFFT_Z2Z);

/* executes FFT processes */

    cufftExecZ2Z(plan, (cufftDoubleComplex *)devPtr, (cufftDoubleComplex *)devPtr, CUFFT_FORWARD);

/* transfer results from GPU memory */

    cudaMemcpy(data,(cufftDoubleComplex *)devPtr, sizeof(cufftDoubleComplex)*NX*NY, cudaMemcpyDeviceToHost);

/* deletes CUFFT plan */

    cufftDestroy(plan);

    /* frees GPU memory */

    cudaFree(devPtr);

for(i = 0 ; i < 10 ; i++){

            printf("data[%d] %f %f\n", i, data[i].x, data[i].y);

    }

return 0;

}

[/codebox]

With this input i’ve this output :

[codebox]data[0] 57600.000000 57600.000000

data[1] 0.000000 -0.000000

data[2] -0.000000 -0.000000

data[3] -0.000000 -0.000000

data[4] -0.000000 -0.000000

data[5] -0.000000 -0.000000

data[6] 0.000000 -0.000000

data[7] -0.000000 -0.000000

data[8] -0.000000 -0.000000

data[9] -0.000000 -0.000000

[/codebox]

with NX=NY=128 i got this :

[codebox]data[0] 16384.000000 16384.000000

data[1] 0.000000 0.000000

data[2] 0.000000 0.000000

data[3] 0.000000 0.000000

data[4] 0.000000 0.000000

data[5] 0.000000 0.000000

data[6] 0.000000 0.000000

data[7] 0.000000 0.000000

data[8] 0.000000 0.000000

data[9] 0.000000 0.000000

[/codebox]

And it seams to work

Thanks.

mfatica · February 23, 2010, 3:16pm

You don’t need to pad the array, CUFFT has no restrictions on N.
The power of 2 transform (256) will be faster than 240 (3516) but the result will be correct in both cases.

Fr0stY · February 23, 2010, 5:40pm

I ask this because in my Fortran program I’ve replaced the Fortran FFT routines with the corresponding CUFFT, but the results aren’t the same. Since you tell me that it doesn’t need N to be a power of 2, the problem must be something else. I’ve no idea of what it could be, though.

Is my implementation correct? If i’ve understood correctly i can use same function without reguard of the input size being a power of 2 or not.

Thanks yet again.

mfatica · February 23, 2010, 5:45pm

If you are calling from Fortran, remember that CUFFT is expecting row-major order.
So, if your Fortran array is a(NX,NY) when you set up the 2D plan, the call should be:
cufftPlan2d(&plan, NY,NX, CUFFT_Z2Z);

Fr0stY · February 24, 2010, 1:25pm

I’ve modified cufftPlan2d like you said but i still get the same results as before.

Perhaps this happens because NX and NY have the same value?

I am still puzzled by the huge difference between the parameters taken by the

Fortran FFT calls and the Cuda ones.

mfatica · February 24, 2010, 3:14pm

What is wrong with the results?

You are transforming a constant signal, so the zero wave-number (the only one that should have non zero coefficients) contains the sum of the signal.

When NX=NY=240, NXNY=57600, for NX=NY=128, NXNY=16384.

These are the numbers reported in the output you posted.

wanderine · February 27, 2010, 8:24pm

I have some volumes of the size 160 x 64 x 224, is 3D FFT for 256 x 64 x 256 faster than for 160 x 64 x 224? Does the order of the sizes matter, i.e. is for example 3D FFT of 64 x 160 x 224 faster?

Topic		Replies	Views
cufft power of 2 performance CUDA Programming and Performance	3	1108	August 12, 2016
cufft padding question CUDA Programming and Performance	2	1069	August 5, 2017
cuFFT problem when FFT point number > 2000 GPU-Accelerated Libraries cufft	0	432	August 10, 2023
CUFFT without padding? CUDA Programming and Performance	4	1003	October 10, 2018
Kernels that modify a 1D FFT Problem because power-of-two issue CUDA Programming and Performance	3	2087	September 25, 2008
CUFFT_INTERNAL_ERROR during creation of a 1D Plan in CUFFT GPU-Accelerated Libraries cuda , cufft	11	4200	October 19, 2022
[SOLVED] cuFFT not liking a given length (error 2), but will accept larger work GPU-Accelerated Libraries	5	941	July 2, 2019
cuFFT return zeros CUDA Programming and Performance	6	1929	May 14, 2011
Wrong results in CUFFT! CUDA Programming and Performance	4	5552	March 22, 2011
cufftXtMakePlanMany fp16 data size limiation GPU-Accelerated Libraries cuda , cufft	0	28	April 30, 2026

CUFFT not a power of two element

Related topics