cufftPlan2d() returns CUFFT_INVALID_VALUE, should be impossible


I have a problem with cufftPlan2d() from the cufft library, it shows memory access errors (says valgrind) and returns an invalid value (says me). I think those are really bugs that are not mine, but feel free to correct me!
Running linux (ubuntu 10.04), cuda 3.1.

Originally I posted it here:
but I’m beginning to suspect that I’m not addressing the right crowd … trying my luck here now.

I can provide binaries if needed.

Best, and thanks in advance.

Same problem with CUDA 3.2 on Windows 7.

Replace library to version 3.0 and it works fine. It is bug. Where developers?

The documentation is incorrect here: cufftPlan2d() can indeed return CUFFT_INVALID_VALUE in some circumstances. The most likely explanation is that you might have passed an invalid parameter to cufftPlan2d(). We attempted to clean up the documentation for these kinds of situations in CUFFT 3.1 and CUFFT 3.2, but overlooked this one. We will fix the documentation for the next release.

Meanwhile, can you tell me the parameters you passed to cufftPlan2d() (or, better yet, can you send me a small stand-alone bit of code that reproduces the issue) so that the CUFFT engineers can take a look at this?



Was used cufft32_32_16.dll
I call it with next parameters

32-bit pointer for handle (platform is 32 bit), handle = 0 just for case
32-bit int = 512 (width)
32-bit int = 512 (height)
32-bit enum = CUFFT_R2C (type)
cufftResult is 32-bit too
function have stdcall convention
Thet work with cufft32_30_14.dll

Also I tried 8-bit type and/or 8-bit cufftResult and/or cdecl convention - same error.

Sorry, we are unable to repro based on this information on our end. Would it be possible to post a self-contained program that reproduces the problem? You could also send me a personal message via the forum and attach the repro program to that, if you prefer. Thanks.

Okay. I attached simple test project. It can load different version of library.


Thanks for posting the code. Now I am mightily confused because the archive contains Pascal source code. To my knowledge NVIDIA does not provide Pascal bindings for CUFFT, or CUDA in general. I am afraid I am not equipped to follow up on this further.

What’s the difference what language is used. To load a dynamic link library used WinAPI. Have you checked it at all?

I understand that in C all use static linking, but there are other languages also :verymad:

By the way, why the dynamic library size is 27M, and the static is less than 5K? It is very inconvenient for the distribution of small demos.

Okay, find a three differences in your library


033113F0 83EC08           sub esp,$08

033113F3 8B442410         mov eax,[esp+$10]

033113F7 8B542418         mov edx,[esp+$18]

033113FB 8B4C2414         mov ecx,[esp+$14]

033113FF 6A01             push $01

03311401 6AFF             push $ff

03311403 6A01             push $01

03311405 52               push edx

03311406 89442410         mov [esp+$10],eax

0331140A 8D442410         lea eax,[esp+$10]

0331140E 50               push eax

0331140F 894C2418         mov [esp+$18],ecx

03311413 8B4C2420         mov ecx,[esp+$20]

03311417 6A02             push $02

03311419 51               push ecx

0331141A E841FEFFFF       call $03311260

0331141F 83C424           add esp,$24

03311422 C21000           ret $0010


034B12F0 83EC08           sub esp,$08

034B12F3 8B442410         mov eax,[esp+$10]

034B12F7 8B542418         mov edx,[esp+$18]

034B12FB 8B4C2414         mov ecx,[esp+$14]

034B12FF 6A01             push $01

034B1301 52               push edx

034B1302 89442408         mov [esp+$08],eax

034B1306 8D442408         lea eax,[esp+$08]

034B130A 50               push eax

034B130B 894C2410         mov [esp+$10],ecx

034B130F 8B4C2418         mov ecx,[esp+$18]

034B1313 6A02             push $02

034B1315 51               push ecx

034B1316 E885FEFFFF       call $034b11a0

034B131B 83C41C           add esp,$1c

034B131E C21000           ret $0010

I get CUFFT_INVALID_VALUE calls myself, for perfectly valid values of N and Nb in cufftPlan1d. I have a loop that tries to determine the largest batch size that will fit my data needs and FFT plan into memory. It seems that when I choose a batch size even slightly too large, that it returns CUFFT_INVALID_VALUE. But in that case, it seems to me that it should return CUFFT_ALLOC_FAILED.

hi guys

i am new in cuda. i m getting problem with this code. i got this code by googling on net

#include <stdio.h>

#include <math.h>


#include <cuda.h>

#include <cuda_runtime.h>

#include <cufft.h>

#define N 128

int main()


//Allocate arrays on the host

float *kx,*ky,*r;

float scale;

kx =(float*)malloc(sizeof(float)*N);

ky =(float*)malloc(sizeof(float)*N);

r = (float*)malloc(sizeof(float)*N*N);

// Aloocate arrays on the GPU with cudaMalloc

float *kx_d, *ky_d, *r_d;

cudaMalloc((void **)&kx_d, sizeof(cufftComplex)*N);

cudaMalloc((void **)&ky_d, sizeof(cufftComplex)*N);

cudaMalloc((void **)&r_d, sizeof(cufftComplex)*N*N);

cufftComplex *r_complex_d;

cudaMalloc((void **)&r_complex_d, sizeof(cufftComplex)*N*N);

// initialize r, kx and ky on the host

//Transfer data from host to device with cudaMemcpy(target,source,size,direction)

cudaMemcpy(kx_d,kx,sizeof(float)*N , cudaMemcpyHostToDevice);

cudaMemcpy(ky_d,ky,sizeof(float)*N , cudaMemcpyHostToDevice);

cudaMemcpy(r_d,r,sizeof(float)*N*N , cudaMemcpyHostToDevice);

//Creat plan for CUDA FFT

cufftHandle plan;


/* compute the execution configuration NB: 

block_size_x*bloxk_size_y = number of threads */

dim3 dimBlock(N, N);

dim3 dimGrid(N/dimBlock.x, N/dimBlock.y);

// Hnadle N not multiple of block_size_x or block_size_y

if (N%N !=0) dimGrid.x+=1;

if (N%N !=0) dimGrid.y+=1;

// Copy real data to complex data

__global__ void real2complex(float *a, cufftComplex *c, int N) {

//compute idx and idy, the location of the element in the original NxN array

int idx = blockId.x*blockDim.x+threadIdx.x;

int idy = blockId.y*blockDim.y+threadIdx.y;

if (idx<N && idy<N)


int index = idx + idy*N;

c[index].x = a[index];

c[index].y = 0.f;



// Transform real input to complex input


// Compute in place forward FFT

cufftExecC2C(plan,r_complex_d, r_complex_d,CUFFT_FORWARD);


__global__ void solve_poission(cufftComplex *c, float *kx, float *ky, int N) {

// compute idx and idy, the location of the elemtn in the original NxN array

int idx = blockId.x*blockDim.x+threadIdx.x;

int idy = blockId.y*blockDim.y+threadIdx.y;

if (idx <N && idy<N)


int index = idx + idy*N;

float scale = - (kx[idx]*kx[idx] + ky[idy]*ky[idy]);

if (idx == 0 && idy == 0) scale = 1.f;

scale = 1.f/scale;





//solve poisson equation in Fourier space


//Compute in place inverse FFT

cufftExecC2C(plan,r_complex_d,r_complex_d, CUFFT_INVERSE);

// comlex2real_scaled copy real part of complex data into real array and apply scaling

__global__ void complex2real_scaled(cufftComplex *c, float *a, int N, float scale) {

// compute idx and idy, the location of the element in the origina NxN array

int idx = blockId.x*blockDim.x+threadIdx.x;

int idy = blockId.y*blockDim.x+threadIdx.y;

if (idx <N && idy <N)


int index = idx + idy*N;

a[index] = scale*c[index].x; 



/*copy the solution to a real array and apply scaling (anFFT followed

by iFFT will give you back the same array times the length of the transform)*/

scale = 1.f/((float)N * (float)N);

complex2real_scaled<<<dimGrid, dimBlock>>>(r_d,r_complex_d,N,scale);

/*Transfer data from device to host withcudaMemcpy(target,source,size,direction)*/

cudaMemcpy(r,r_d,sizeof(float)*N*N, cudaMemcpyDeviceToHost);

// distroy plan and clean up memory on device







on running this code its giving error error: expected a “)” error: expected a “;” warning: parsing restarts here after previous syntax error error: expected a “)” error: expected a “;” warning: parsing restarts here after previous syntax error error: expected a “)” error: expected a “;” warning: parsing restarts here after previous syntax error error: argument of type “float *” is incompatible with parameter of type “cufftComplex *” error: argument of type “cufftComplex *” is incompatible with parameter of type “float *” error: too many arguments in function call warning: variable “scale” is used before its value is set

9 errors detected in the compilation of “/tmp/tmpxft_0000037a_00000000-4_exm_lavi.cpp1.ii”.

what i missing in code.

i m using OS- Linux-x86_64 Driver-260.19.29 cudatoolkit 3.2

Nvidia Card- GeForce 9800 GT

please help

Hi Naveeniisc,

The problem you’re having is off-topic for this particular discussion thread. But just so you know, the problem is that you can’t define a function inside of another function in either C or CUDA C. The compiler errors tell you the line number in the code (line 54, for example) where you have attempted to do this. You need to move the definitions of the CUDA C kernels (aka global functions) outside of main(). That said, I recommend that you study up on C/C++ before attempting to learn CUDA C/C++. For C, a good book to read is “The C Programming Language” by Kernighan and Ritchie.

Hope this helps,


Hi Cliff Woolley

thank you it helps a lot
i m extremely sorry for off-topic for this thread. i m new in forum also. thank you suggestion for book. i will buy.