cufftPlan2d() returns CUFFT_INVALID_VALUE, should be impossible

wateenellende · August 3, 2010, 11:29am

Hi,

I have a problem with cufftPlan2d() from the cufft library, it shows memory access errors (says valgrind) and returns an invalid value (says me). I think those are really bugs that are not mine, but feel free to correct me!
Running linux (ubuntu 10.04), cuda 3.1.

Originally I posted it here: [url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA
but I’m beginning to suspect that I’m not addressing the right crowd … trying my luck here now.

I can provide binaries if needed.

Best, and thanks in advance.
wateenellende

YarUnderoaker · January 12, 2011, 6:50pm

Same problem with CUDA 3.2 on Windows 7.

Replace library to version 3.0 and it works fine. It is bug. Where developers?

Cliff_Woolley · January 13, 2011, 6:50pm

The documentation is incorrect here: cufftPlan2d() can indeed return CUFFT_INVALID_VALUE in some circumstances. The most likely explanation is that you might have passed an invalid parameter to cufftPlan2d(). We attempted to clean up the documentation for these kinds of situations in CUFFT 3.1 and CUFFT 3.2, but overlooked this one. We will fix the documentation for the next release.

Meanwhile, can you tell me the parameters you passed to cufftPlan2d() (or, better yet, can you send me a small stand-alone bit of code that reproduces the issue) so that the CUFFT engineers can take a look at this?

Thanks,

Cliff

YarUnderoaker · January 13, 2011, 8:44pm

Was used cufft32_32_16.dll
I call it with next parameters

32-bit pointer for handle (platform is 32 bit), handle = 0 just for case
32-bit int = 512 (width)
32-bit int = 512 (height)
32-bit enum = CUFFT_R2C (type)
cufftResult is 32-bit too
function have stdcall convention
Thet work with cufft32_30_14.dll

Also I tried 8-bit type and/or 8-bit cufftResult and/or cdecl convention - same error.

njuffa · January 14, 2011, 6:43pm

Sorry, we are unable to repro based on this information on our end. Would it be possible to post a self-contained program that reproduces the problem? You could also send me a personal message via the forum and attach the repro program to that, if you prefer. Thanks.

YarUnderoaker · January 15, 2011, 2:03pm

Okay. I attached simple test project. It can load different version of library.

YarUnderoaker · January 15, 2011, 2:06pm

[attachment=19543:CUFFTTEST.RAR]

njuffa · January 16, 2011, 9:35am

Thanks for posting the code. Now I am mightily confused because the archive contains Pascal source code. To my knowledge NVIDIA does not provide Pascal bindings for CUFFT, or CUDA in general. I am afraid I am not equipped to follow up on this further.

YarUnderoaker · January 16, 2011, 12:01pm

What’s the difference what language is used. To load a dynamic link library used WinAPI. Have you checked it at all?

I understand that in C all use static linking, but there are other languages also External Image

By the way, why the dynamic library size is 27M, and the static is less than 5K? It is very inconvenient for the distribution of small demos.

YarUnderoaker · January 16, 2011, 12:10pm

Okay, find a three differences in your library

cufft32_32_16.cufftPlan2d:

033113F0 83EC08           sub esp,$08

033113F3 8B442410         mov eax,[esp+$10]

033113F7 8B542418         mov edx,[esp+$18]

033113FB 8B4C2414         mov ecx,[esp+$14]

033113FF 6A01             push $01

03311401 6AFF             push $ff

03311403 6A01             push $01

03311405 52               push edx

03311406 89442410         mov [esp+$10],eax

0331140A 8D442410         lea eax,[esp+$10]

0331140E 50               push eax

0331140F 894C2418         mov [esp+$18],ecx

03311413 8B4C2420         mov ecx,[esp+$20]

03311417 6A02             push $02

03311419 51               push ecx

0331141A E841FEFFFF       call $03311260

0331141F 83C424           add esp,$24

03311422 C21000           ret $0010

cufft32_30_14.cufftPlan2d:

034B12F0 83EC08           sub esp,$08

034B12F3 8B442410         mov eax,[esp+$10]

034B12F7 8B542418         mov edx,[esp+$18]

034B12FB 8B4C2414         mov ecx,[esp+$14]

034B12FF 6A01             push $01

034B1301 52               push edx

034B1302 89442408         mov [esp+$08],eax

034B1306 8D442408         lea eax,[esp+$08]

034B130A 50               push eax

034B130B 894C2410         mov [esp+$10],ecx

034B130F 8B4C2418         mov ecx,[esp+$18]

034B1313 6A02             push $02

034B1315 51               push ecx

034B1316 E885FEFFFF       call $034b11a0

034B131B 83C41C           add esp,$1c

034B131E C21000           ret $0010

mcg · January 20, 2011, 2:49pm

I get CUFFT_INVALID_VALUE calls myself, for perfectly valid values of N and Nb in cufftPlan1d. I have a loop that tries to determine the largest batch size that will fit my data needs and FFT plan into memory. It seems that when I choose a batch size even slightly too large, that it returns CUFFT_INVALID_VALUE. But in that case, it seems to me that it should return CUFFT_ALLOC_FAILED.

naveeniisc · January 24, 2011, 4:02pm

hi guys

i am new in cuda. i m getting problem with this code. i got this code by googling on net

#include <stdio.h>

#include <math.h>

#include<complex.h>

#include <cuda.h>

#include <cuda_runtime.h>

#include <cufft.h>

#define N 128

int main()

{

//Allocate arrays on the host

float *kx,*ky,*r;

float scale;

kx =(float*)malloc(sizeof(float)*N);

ky =(float*)malloc(sizeof(float)*N);

r = (float*)malloc(sizeof(float)*N*N);

// Aloocate arrays on the GPU with cudaMalloc

float *kx_d, *ky_d, *r_d;

cudaMalloc((void **)&kx_d, sizeof(cufftComplex)*N);

cudaMalloc((void **)&ky_d, sizeof(cufftComplex)*N);

cudaMalloc((void **)&r_d, sizeof(cufftComplex)*N*N);

cufftComplex *r_complex_d;

cudaMalloc((void **)&r_complex_d, sizeof(cufftComplex)*N*N);

// initialize r, kx and ky on the host

//Transfer data from host to device with cudaMemcpy(target,source,size,direction)

cudaMemcpy(kx_d,kx,sizeof(float)*N , cudaMemcpyHostToDevice);

cudaMemcpy(ky_d,ky,sizeof(float)*N , cudaMemcpyHostToDevice);

cudaMemcpy(r_d,r,sizeof(float)*N*N , cudaMemcpyHostToDevice);

//Creat plan for CUDA FFT

cufftHandle plan;

cufftPlan2d(&plan,N,N,CUFFT_C2C);

/* compute the execution configuration NB: 

block_size_x*bloxk_size_y = number of threads */

dim3 dimBlock(N, N);

dim3 dimGrid(N/dimBlock.x, N/dimBlock.y);

// Hnadle N not multiple of block_size_x or block_size_y

if (N%N !=0) dimGrid.x+=1;

if (N%N !=0) dimGrid.y+=1;

// Copy real data to complex data

__global__ void real2complex(float *a, cufftComplex *c, int N) {

//compute idx and idy, the location of the element in the original NxN array

int idx = blockId.x*blockDim.x+threadIdx.x;

int idy = blockId.y*blockDim.y+threadIdx.y;

if (idx<N && idy<N)

{

int index = idx + idy*N;

c[index].x = a[index];

c[index].y = 0.f;

}

}/

// Transform real input to complex input

real2complex<<<dimGrid,dimBlock>>>(r_d,r_complex_d,N);

// Compute in place forward FFT

cufftExecC2C(plan,r_complex_d, r_complex_d,CUFFT_FORWARD);

//solve_possion

__global__ void solve_poission(cufftComplex *c, float *kx, float *ky, int N) {

// compute idx and idy, the location of the elemtn in the original NxN array

int idx = blockId.x*blockDim.x+threadIdx.x;

int idy = blockId.y*blockDim.y+threadIdx.y;

if (idx <N && idy<N)

{

int index = idx + idy*N;

float scale = - (kx[idx]*kx[idx] + ky[idy]*ky[idy]);

if (idx == 0 && idy == 0) scale = 1.f;

scale = 1.f/scale;

c[index].x*=scale;

c[index].y*=scale;

}

}

//solve poisson equation in Fourier space

solve_possion<<<dimGrid,dimBlock>>>(r_complex_d,kx_d,ky_d,N);

//Compute in place inverse FFT

cufftExecC2C(plan,r_complex_d,r_complex_d, CUFFT_INVERSE);

// comlex2real_scaled copy real part of complex data into real array and apply scaling

__global__ void complex2real_scaled(cufftComplex *c, float *a, int N, float scale) {

// compute idx and idy, the location of the element in the origina NxN array

int idx = blockId.x*blockDim.x+threadIdx.x;

int idy = blockId.y*blockDim.x+threadIdx.y;

if (idx <N && idy <N)

{

int index = idx + idy*N;

a[index] = scale*c[index].x; 

}

}

/*copy the solution to a real array and apply scaling (anFFT followed

by iFFT will give you back the same array times the length of the transform)*/

scale = 1.f/((float)N * (float)N);

complex2real_scaled<<<dimGrid, dimBlock>>>(r_d,r_complex_d,N,scale);

/*Transfer data from device to host withcudaMemcpy(target,source,size,direction)*/

cudaMemcpy(r,r_d,sizeof(float)*N*N, cudaMemcpyDeviceToHost);

// distroy plan and clean up memory on device

cufftDestroy(plan);

cudaFree(r_complex_d);

cudaFree(kx_d);

cudaFree(r_d);

cudaFree(ky_d);

}

on running this code its giving error

exm_lavi.cu(54): error: expected a “)”

exm_lavi.cu(54): error: expected a “;”

exm_lavi.cu(69): warning: parsing restarts here after previous syntax error

exm_lavi.cu(75): error: expected a “)”

exm_lavi.cu(75): error: expected a “;”

exm_lavi.cu(92): warning: parsing restarts here after previous syntax error

exm_lavi.cu(100): error: expected a “)”

exm_lavi.cu(100): error: expected a “;”

exm_lavi.cu(114): warning: parsing restarts here after previous syntax error

exm_lavi.cu(115): error: argument of type “float *” is incompatible with parameter of type “cufftComplex *”

exm_lavi.cu(115): error: argument of type “cufftComplex *” is incompatible with parameter of type “float *”

exm_lavi.cu(115): error: too many arguments in function call

exm_lavi.cu(115): warning: variable “scale” is used before its value is set

9 errors detected in the compilation of “/tmp/tmpxft_0000037a_00000000-4_exm_lavi.cpp1.ii”.

what i missing in code.

i m using OS- Linux-x86_64 Driver-260.19.29 cudatoolkit 3.2

Nvidia Card- GeForce 9800 GT

please help

Cliff_Woolley · January 24, 2011, 4:20pm

Hi Naveeniisc,

The problem you’re having is off-topic for this particular discussion thread. But just so you know, the problem is that you can’t define a function inside of another function in either C or CUDA C. The compiler errors tell you the line number in the code (line 54, for example) where you have attempted to do this. You need to move the definitions of the CUDA C kernels (aka global functions) outside of main(). That said, I recommend that you study up on C/C++ before attempting to learn CUDA C/C++. For C, a good book to read is “The C Programming Language” by Kernighan and Ritchie.

Hope this helps,

Cliff

naveeniisc · January 24, 2011, 4:42pm

Hi Cliff Woolley

thank you it helps a lot
i m extremely sorry for off-topic for this thread. i m new in forum also. thank you suggestion for book. i will buy.

Topic		Replies	Views
CUFFT bug in Cuda 4.0 Release Candidate 2 CUDA Programming and Performance	8	1647	May 5, 2011
CUFFT_INTERNAL_ERROR during creation of a 1D Plan in CUFFT GPU-Accelerated Libraries cuda , cufft	11	3880	October 19, 2022
Questions about cuFFT for 3D matrix, arrayFire GPU-Accelerated Libraries	5	1670	October 12, 2021
cuFFT and fftw CUDA Programming and Performance	10	4184	August 25, 2010
size limit of 1D FFT CUDA Programming and Performance	8	2538	September 24, 2011
Upgrading to CUDA 12.4 broke down the application GPU-Accelerated Libraries cublas , cusparse	13	1190	July 21, 2024
Cufft_R2C and Cufft_C2R are inaccurate GPU-Accelerated Libraries	2	1741	April 11, 2014
cuFFT cufftPlan1d and cufftExecR2C issues GPU-Accelerated Libraries	4	2391	July 13, 2016
batched 2d fft's return wrong answer in last slice on k80 using cuda 8 GPU-Accelerated Libraries	2	608	September 19, 2016
cufft question CUDA Programming and Performance	6	8664	March 9, 2009

cufftPlan2d() returns CUFFT_INVALID_VALUE, should be impossible

Same problem with CUDA 3.2 on Windows 7.

Related topics