Simple complex function: Error on imaginary component

Hello and thanks for reading this!

This might be a pretty easy one to solve for some of you but somehow I just can’t find the fix.

I have a function, which does the inverse scaling after IFFT computation, which looks like this:

static __global__ void  inverseFFTscaling(cufftDoubleComplex *c_in, int N)


	const int idx   = threadIdx.x + blockDim.x * blockIdx.x;

	c_in[idx].x = c_in[idx].x / N; // real

	c_in[idx].y = c_in[idx].y / N; // imaginary


The result that I obtain is that the real part, i.e. the .x part, is correct. The imaginary part, however, is a clone of the result for the real part and not the imaginary part divided by N.

If I comment out the real part then I get the correct result for the imaginary part - of course this is not a solution since I would also like to have the result for the real part.

The same function for single precision works fine (of course with data type cufftComplex).

Any help would be appreciated!


Are you using the flag for double precision (-arch sm_13 ) ?


Yes, I am already using it. I have added

CFLAGS='-O3 -arch sm_13 -Xcompiler "-fPIC -D_GNU_SOURCE -pthread -fexceptions -m64"'

in This is a mex-File for Matlab.

Without the “-arch sm_13” I get a warning as expected:

ptxas /tmp/tmpxft_00001ca3_00000000-2_performNFFTsonvectorX_cudadbl.ptx, line 70; warning : Double is not supported. Demoting to float

This is part of a mex-Function for Matlab, which just performs FFT/IFFT/Scaling. Since I measured the error of the outcome to estimate the accuracy for my application I first only looked at RMSE between expected and obtained result. The RMSE was huge. When I then had a look at the real/imaginary part I found out that after the scaling function (see function in previous post) the real part is cloned onto the imaginary part.

I have installed the latest version of the CUDA SDK, the latest NVIDIA Driver, and the latest CUDA toolkit (hence double prec. FFT - thanks for that by the way!). My card is a GTX 285.

Any ideas?


Can you post the full source?
Which OS?

The OS is OpenSuse 11.1 x64 [Fujitsu Siemens Workstation M460, Core2 Quad Q9650, 8GB RAM].

Tried to upload but suffix is not allowed -> So just inline now…

Thanks for your help!


#include "mex.h"

#include <cufft.h>

#include <cutil_inline.h>

/* input arguments */

#define A0_in	   prhs[0] /* input vector */

#define Nt_in	   prhs[1] /* How many forward and inverse transforms?  */

#define inputs	   2

/* output arguments */

#define AL_out	  plhs[0]

#define outputs	  1	 /* output vector */



/* MATLAB stores complex numbers in separate arrays for the real and

   imaginary parts.  The following functions take the data in

   this format and pack it into a complex work array, or

   unpack it, respectively.  

   We are using cufftComplex defined in cufft.h to  handle complex on Windows and Linux


void pack_c2dc(cufftDoubleComplex  *output, 

			  double *input_re, 

			  double *input_im, 

			  int Ntot)


	int n;



	   output[n].x =  input_re[n];

	   output[n].y =  input_im[n];



void unpack_c2dc(cufftDoubleComplex  *input, 

				double *output_re, 

				double *output_im,  

				int Ntot)


	int n;



	   output_re[n] = input[n].x;

	   output_im[n] = input[n].y;



/* ------------------------------------------------------------------------

 * The following function performs the scaling after IFFT computation

 * ------------------------------------------------------------------------


static __global__ void  inverseFFTscaling(cufftDoubleComplex *c_in, int N)


	const int idx   = threadIdx.x + blockDim.x * blockIdx.x;

	c_in[idx].x = c_in[idx].x / N;

	c_in[idx].y = c_in[idx].y / N;


/* ------------------------------------------------------------------------

 * Start of the main function

 * ------------------------------------------------------------------------


void mexFunction( int nlhs, mxArray *plhs[],

				  int nrhs, const mxArray *prhs[])


	/* Declaration of input variables */

	cufftDoubleComplex *A0_dcmplxptr, *AL_dcmplxptr;	

	int Ntransforms;


	/* Declaration of working variables */

	cufftDoubleComplex *AzX_GPU_dcmplxptr;

	cufftHandle p_fwd;

	int k, N, block_size;


		mexErrMsgTxt("usage: [AL] = performNFFTsonvectorX_cudadbl(A0,Nt)");

	else if(nrhs!=inputs) mexErrMsgTxt("This function requires 2 input arguments");

	else if(nlhs!=outputs) mexErrMsgTxt("This function requires 1 output arguments");

	/* Check input for dimension */

	if((mxGetN(A0_in)==1)&&(mxGetM(A0_in)==1)) mexErrMsgTxt("The input A0_dcmplxptr cannot be a scalar!\n");


		if((mxGetN(A0_in)>1)&&(mxGetM(A0_in)==1)) N=mxGetN(A0_in);

		else N=mxGetM(A0_in);


	Ntransforms = (int) mxGetScalar(Nt_in);

	/* Reserve some space */

	A0_dcmplxptr = (cufftDoubleComplex*) mxMalloc(sizeof(cufftDoubleComplex) * N);

	AL_dcmplxptr = (cufftDoubleComplex*) mxMalloc(sizeof(cufftDoubleComplex) * N);



	 * Assign the input to working variables


	/* Matlab Re/Im into cufftComplex array */


	/* Initialize variables to the graphics card / Reserve Space */

	cutilSafeCall(cudaMalloc( (void **) &AzX_GPU_dcmplxptr,sizeof(cufftDoubleComplex)*N));

	/* Compute the execution configuration */


	dim3 dimBlock(block_size,1);

	dim3 dimGrid ( (N/dimBlock.x) + (!(N%dimBlock.x)?0:1) , 1);


	 * Initialize CUFFT


	/* Find the forward fft-plan */

	cufftSafeCall(cufftPlan1d(&p_fwd, N, CUFFT_Z2Z,1));

	cutilSafeCall(cudaMemcpy( AzX_GPU_dcmplxptr, A0_dcmplxptr, sizeof(cufftDoubleComplex)*N, cudaMemcpyHostToDevice));



		cufftSafeCall(cufftExecZ2Z(p_fwd, AzX_GPU_dcmplxptr, AzX_GPU_dcmplxptr, CUFFT_FORWARD));

		cufftSafeCall(cufftExecZ2Z(p_fwd, AzX_GPU_dcmplxptr, AzX_GPU_dcmplxptr, CUFFT_INVERSE));

		inverseFFTscaling<<<dimGrid,dimBlock>>>(AzX_GPU_dcmplxptr, N);



	/*cudaMemcpy( AL_fcmplxptr, AzX_GPU_fcmplxptr, sizeof(cufftComplex)*N, cudaMemcpyDeviceToHost);*/

	cutilSafeCall(cudaMemcpy( AL_dcmplxptr, AzX_GPU_dcmplxptr, sizeof(cufftDoubleComplex)*N, cudaMemcpyDeviceToHost));


	/* Outout from cufftComplex array into Matlab Re/Im format */


	/* Free resources */







One other thing I’ve tried is to use two separate functions for scaling the real/imaginary parts of the complex vector.

This does lead to the correct result.

/* Scaling both real/imag */

static __global__ void  inverseFFTscaling(cufftDoubleComplex *c_in, int N)


	const int idx   = threadIdx.x + blockDim.x * blockIdx.x;	

	c_in[idx].x = c_in[idx].x / N;

	c_in[idx].y = c_in[idx].y / N;


/* Scaling real */

static __global__ void  inverseFFTscalingRe(cufftDoubleComplex *c_in, int N)


	const int idx   = threadIdx.x + blockDim.x * blockIdx.x;

	c_in[idx].x = c_in[idx].x / N;


/* Scaling imag */

static __global__ void  inverseFFTscalingIm(cufftDoubleComplex *c_in, int N)


	const int idx   = threadIdx.x + blockDim.x * blockIdx.x;

	c_in[idx].y = c_in[idx].y / N;


There must be someone who has encountered this problem?

Any ideas would be appreciated.


It seems like a compiler bug, I have replicated in a standalone application.
Thanks for reporting the problem.

A quick workaround is to change your code from
c_in[idx].x = c_in[idx].x /N ; // real
c_in[idx].y = c_in[idx].y /N; // imaginary

double scale=1./N;
c_in[idx].x = c_in[idx].x *scale ; // real
c_in[idx].y = c_in[idx].y *scale; // imaginary

Thanks for the hint. I have just included your suggested modification and it works.

Best wishes,


Hi both of them,

I am a beginner in cuda. I want to test how the double precision works.

I have written a code similiar like the code before. It only makes the FFT of a signal and then scales it.

To make the FFT, I use a vector of cufftDoubleComplex which read the arguments from a file where the values are in ascii:

cufftDoubleComplex* x_t;

FILE * fp;

x_t = (cufftDoubleComplex*)malloc(iMuestras*sizeof(cufftDoubleComp


fp = fopen("…/datos_entrada/senal_pruebaB.txt", “r” );

while (fgets(c, 40, fp)!=NULL)






I reserve a memory in device…

… then…

// CUFFT plan

cufftHandle plan;

cufftSafeCall(cufftPlan1d(&plan, new_size, CUFFT_Z2Z, 1));

cufftSafeCall(cufftExecZ2Z(plan, (cufftDoubleComplex *)d_signal, (cufftDoubleComplex *)d_signal, CUFFT_FORWARD));

I write the Makefile like:

Add source files here


CUDA source files (compiled with cudacc)


FLAGS para compilar en doble precision

CFLAGS=’-arch sm_13’

C/C++ source files (compiled with gcc / c++)


Additional libraries needed by the project


I compile, with MAKE, and it compiles perfectly and the applications makes FFT perfectly.

HOWEVER, when I intruce the function scale:


int block_size=128;

dim3 dimBlock(block_size,1);

dim3 dimGrid ( (new_size/dimBlock.x) + (!(new_size%dimBlock.x)?0:1) , 1);

// Scale the result

ComplexPointwiseMulAndScale<<<dimGrid, dimBlock>>>(d_signal, 1.0f/new_size);


// Scale FFT values

static global void ComplexPointwiseMulAndScale(cufftDoubleComplex* a, double scale)


const int threadID = blockIdx.x * blockDim.x + threadIdx.x;

a[threadID].x =  a[threadID].x *scale;

a[threadID].y =  a[threadID].y *scale;


When I compile, The compiler says:

ptxas /tmp/tmpxft_00004bcb_00000000-2_convolutionOVS.ptx, line 70; warning : Double is not supported. Demoting to float

and the application does not work.

I am using a TESLA C0160 with Capacity 1.3, CUDA 2.3, It should work, shouldn’t it ???. What should I change?

Thank you very much,