Dear all:
I try to check maximum problem size of cuFFT on 3D data.
my platform: winxp pro64, vc2005, GTX295 + Tesla C1060, driver 109.38, cuda 2.3
I use Tesla C1060 as computational kernel and focus on “double precision”
first I can allocate 3.7GB to do FFT (real to complex) as code 1
nx = ny = nz = 628
code 1
[codebox]
cufftResult ret ;
size_t nx = 628 ;
size_t ny = 628 ;
size_t nz = 628 ;
size_t N = nx * ny * nz ;
double *d_u ; // device of source
double *d_u_hat ; // device of frequency component of source
cufftHandle plan ; // plane of forward R2C transform
cutilSafeCall( cudaMalloc((void**)&d_u, sizeof(double)*N) );
cutilSafeCall( cudaMalloc((void**)&d_u_hat, sizeof(double)*N) );
ret = cufftPlan3d(&plan, nx, ny, nz, CUFFT_D2Z );
if ( CUFFT_SUCCESS != ret ){
if ( CUFFT_ALLOC_FAILED == ret ){
cout << "Error: Allocation of GPU resources for the plan failed" << endl ;
}else{
cout << "Error: cufftPlan3d fails for other reason" << endl ;
}
}else{
cout << "cufftPlan3d success" << endl ;
} // if (CUFFT_SUCCESS != ret )
[/codebox]
Second if I want to do the same FFT for 2 data set but small size
nx = ny = nz = 480, then error occurs at
cufftPlan3d(&plan, nx, ny, nz, CUFFT_D2Z );
with error ID = CUFFT_ALLOC_FAILED (see code 2)
code 2
[codebox]
cufftResult ret ;
size_t nx = 240*2 ;
size_t ny = 240*2 ;
size_t nz = 240*2 ;
size_t N = nx * ny * nz ;
double *d_G_hat ; // device of frequency component of kernel
double *d_u ; // device of source
double *d_u_hat ; // device of frequency component of source
double *d_G ; // d_G <–> h_G
cufftHandle plan ; // plane of forward R2C transform
cutilSafeCall( cudaMalloc((void**)&d_G_hat, sizeof(double)*N) );
cutilSafeCall( cudaMalloc((void**)&d_u, sizeof(double)*N) );
cutilSafeCall( cudaMalloc((void**)&d_u_hat, sizeof(double)*N) );
cutilSafeCall( cudaMalloc((void**)&d_G, sizeof(double)*N) );
ret = cufftPlan3d(&plan, nx, ny, nz, CUFFT_D2Z );
if ( CUFFT_SUCCESS != ret ){
if ( CUFFT_ALLOC_FAILED == ret ){
cout << "Error: Allocation of GPU resources for the plan failed" << endl ;
}else{
cout << "Error: cufftPlan3d fails for other reason" << endl ;
}
}else{
cout << “cufftPlan3d success” << endl ;
} // if (CUFFT_SUCCESS != ret)
[/codebox]
moreover if I interchange
cufftPlan3d(&plan, nx, ny, nz, CUFFT_D2Z );
and
cutilSafeCall( cudaMalloc((void**)&d_G, sizeof(double)*N) );
then error occurs at
cutilSafeCall( cudaMalloc((void**)&d_G, sizeof(double)*N) );
it says “out of memory” (see code 3)
code 3
[codebox]
cufftResult ret ;
size_t nx = 240*2 ;
size_t ny = 240*2 ;
size_t nz = 240*2 ;
size_t N = nx * ny * nz ;
double *d_G_hat ; // device of frequency component of kernel
double *d_u ; // device of source
double *d_u_hat ; // device of frequency component of source
double *d_G ; // d_G <–> h_G
cufftHandle plan ; // plane of forward R2C transform
cutilSafeCall( cudaMalloc((void**)&d_G_hat, sizeof(double)*N) );
cutilSafeCall( cudaMalloc((void**)&d_u, sizeof(double)*N) );
cutilSafeCall( cudaMalloc((void**)&d_u_hat, sizeof(double)*N) );
ret = cufftPlan3d(&plan, nx, ny, nz, CUFFT_D2Z );
if ( CUFFT_SUCCESS != ret ){
if ( CUFFT_ALLOC_FAILED == ret ){
cout << "Error: Allocation of GPU resources for the plan failed" << endl ;
}else{
cout << "Error: cufftPlan3d fails for other reason" << endl ;
}
}else{
cout << “cufftPlan3d success” << endl ;
} // if ( CUFFT_SUCCESS != ret )
cutilSafeCall( cudaMalloc((void**)&d_G, sizeof(double)*N) );
[/codebox]
However above code only uses 3.3GB, smaller than 3.7GB in code 1.
I cannot understand why “out of memory” occurs.