Dear all: I want to do 3-dimensional sine FFT via cuFFT,
the procedure is
- compute 1-D FFT for dimension z with batch = n1*n2
2 transpose from (x,y,z) to (y,z,x)
-
compute 1-D FFT for dimension x with batch = n2*n3
-
transpose from (y,z,x) to (z,x,y)
-
compute 1-D FFT for dimension y with batch = n1*n3
-
transpose from (z,x,y) to (x,y,z)
everything is O.K. but I suffer problem on (nx,ny,nz) = (512,512,512)
the problem comes from 1-D FFT, I write simple code to demonstrate this
the following code is doing 1-D FFT R2C with size n * batch
trnasform real d_u (of size batch * n) to complex d_u_hat ( of size batch*(n/2+1) )
[codebox]#include <stdio.h>
#include <assert.h>
#include “global.h”
void randomInit(doublereal* data, unsigned long long int size)
{
for (int i = 0; i < size; ++i){ data[i] = (double)rand() / (double)RAND_MAX; }
}
void test_1D_size_limit( void )
{
#ifdef DO_DOUBLE
int batch = 512 * 512 ;
int n = 256*2 ;
int batch = 512 * 512 ;
int n = 511*2 ;
cufftResult flag ;
cufftHandle plan ;
doublereal *d_u ; // device memory
Complex *d_u_hat ; // device memory
doublereal *u ; // host memory
// step 1: random data
u = (doublereal *)malloc( sizeof(doublereal)*batch*n ) ;
assert( u ) ;
randomInit( u, batch*n ) ; // random data
// step 2: out-of-place forward FFT in device
cutilSafeCall( cudaMalloc((void**)&d_u, sizeof(doublereal)*batch*n) );
CUDA_SAFE_CALL(cudaMemcpy( d_u, u, sizeof(doublereal)*batch*n , cudaMemcpyHostToDevice) );
cutilSafeCall( cudaMalloc((void**)&d_u_hat, sizeof(Complex)*batch*((n>>1) + 1) ) );
#if defined (DO_DOUBLE)
flag = cufftPlan1d(&plan, n, CUFFT_D2Z, batch );
if ( CUFFT_SUCCESS != flag ){printf("Error: cufftPlan1d( CUFFT_D2Z ) fails\n"); }
flag = cufftPlan1d(&plan, n, CUFFT_R2C, batch );
if ( CUFFT_SUCCESS != flag ){ printf("Error: cufftPlan1d( CUFFT_R2C ) fails\n"); }
#if defined (DO_DOUBLE)
flag = cufftExecD2Z( plan, (cufftDoubleReal *)d_u, d_u_hat );
flag = cufftExecR2C( plan, (cufftReal *)d_u, d_u_hat );
if ( CUFFT_SUCCESS != flag ){
printf("Error (cufftExecR2C): %s \n",cudaGetErrorString (cudaGetLastError()));
printf("error code (cufft) = %d\n", flag);
}
}
[/codebox]
the content of file “global.h” is
[codebox]// “global.h”
#include <cufft.h>
#include <cutil_inline.h>
//#define DO_DOUBLE
#ifdef DO_DOUBLE
typedef double doublereal ;
typedef cufftDoubleComplex Complex;
typedef float doublereal ;
typedef cufftComplex Complex;
#endif[/codebox]
-
batch = 512 * 512, n = 511*2
the program is O.K.
-
batch = 512 * 512, n = 512*2
output is
Error (cufftExecR2C): memory size or pointer value too large to fit in 32 bit
error code (cufft) = 6
after searching cufft.h, error code = 6 means “CUFFT_EXEC_FAILED”
so this is not out-of-memory problem,
does this mean that maximum size of 1-D FFT R2C is 512512511*2, which leads to
2GB ( du + du_hat )
ps: my platform is winxp pro64, vc2005, Tesla C1060, driver 190.38, cuda 2.3