Simple complex function: Error on imaginary component

brightyboy · August 5, 2009, 6:13pm

Hello and thanks for reading this!

This might be a pretty easy one to solve for some of you but somehow I just can’t find the fix.

I have a function, which does the inverse scaling after IFFT computation, which looks like this:

static __global__ void  inverseFFTscaling(cufftDoubleComplex *c_in, int N)

{

	const int idx   = threadIdx.x + blockDim.x * blockIdx.x;

	c_in[idx].x = c_in[idx].x / N; // real

	c_in[idx].y = c_in[idx].y / N; // imaginary

}

The result that I obtain is that the real part, i.e. the .x part, is correct. The imaginary part, however, is a clone of the result for the real part and not the imaginary part divided by N.

If I comment out the real part then I get the correct result for the imaginary part - of course this is not a solution since I would also like to have the result for the real part.

The same function for single precision works fine (of course with data type cufftComplex).

Any help would be appreciated!

Steve

mfatica · August 5, 2009, 10:17pm

Are you using the flag for double precision (-arch sm_13 ) ?

brightyboy · August 6, 2009, 7:04am

Hi,

Yes, I am already using it. I have added

CFLAGS='-O3 -arch sm_13 -Xcompiler "-fPIC -D_GNU_SOURCE -pthread -fexceptions -m64"'

in nvopts.sh. This is a mex-File for Matlab.

Without the “-arch sm_13” I get a warning as expected:

ptxas /tmp/tmpxft_00001ca3_00000000-2_performNFFTsonvectorX_cudadbl.ptx, line 70; warning : Double is not supported. Demoting to float

This is part of a mex-Function for Matlab, which just performs FFT/IFFT/Scaling. Since I measured the error of the outcome to estimate the accuracy for my application I first only looked at RMSE between expected and obtained result. The RMSE was huge. When I then had a look at the real/imaginary part I found out that after the scaling function (see function in previous post) the real part is cloned onto the imaginary part.

I have installed the latest version of the CUDA SDK, the latest NVIDIA Driver, and the latest CUDA toolkit (hence double prec. FFT - thanks for that by the way!). My card is a GTX 285.

Any ideas?

Steve

mfatica · August 6, 2009, 2:58pm

Can you post the full source?
Which OS?

brightyboy · August 6, 2009, 5:43pm

The OS is OpenSuse 11.1 x64 [Fujitsu Siemens Workstation M460, Core2 Quad Q9650, 8GB RAM].

Tried to upload but suffix is not allowed → So just inline now…

Thanks for your help!

Steve

#include "mex.h"

#include <cufft.h>

#include <cutil_inline.h>

/* input arguments */

#define A0_in	   prhs[0] /* input vector */

#define Nt_in	   prhs[1] /* How many forward and inverse transforms?  */

#define inputs	   2

/* output arguments */

#define AL_out	  plhs[0]

#define outputs	  1	 /* output vector */

/************************************************************

**************/

/* MATLAB stores complex numbers in separate arrays for the real and

   imaginary parts.  The following functions take the data in

   this format and pack it into a complex work array, or

   unpack it, respectively.  

   We are using cufftComplex defined in cufft.h to  handle complex on Windows and Linux

*/

void pack_c2dc(cufftDoubleComplex  *output, 

			  double *input_re, 

			  double *input_im, 

			  int Ntot)

{

	int n;

	for(n=0;n<Ntot;n++) 

	{

	   output[n].x =  input_re[n];

	   output[n].y =  input_im[n];

	}

}

void unpack_c2dc(cufftDoubleComplex  *input, 

				double *output_re, 

				double *output_im,  

				int Ntot)

{

	int n;

	for(n=0;n<Ntot;n++) 

	{

	   output_re[n] = input[n].x;

	   output_im[n] = input[n].y;

	}

}

/* ------------------------------------------------------------------------

 * The following function performs the scaling after IFFT computation

 * ------------------------------------------------------------------------

 */

static __global__ void  inverseFFTscaling(cufftDoubleComplex *c_in, int N)

{

	const int idx   = threadIdx.x + blockDim.x * blockIdx.x;

	c_in[idx].x = c_in[idx].x / N;

	c_in[idx].y = c_in[idx].y / N;

}

/* ------------------------------------------------------------------------

 * Start of the main function

 * ------------------------------------------------------------------------

 */

void mexFunction( int nlhs, mxArray *plhs[],

				  int nrhs, const mxArray *prhs[])

{

	/* Declaration of input variables */

	cufftDoubleComplex *A0_dcmplxptr, *AL_dcmplxptr;	

	int Ntransforms;

	

	/* Declaration of working variables */

	cufftDoubleComplex *AzX_GPU_dcmplxptr;

	cufftHandle p_fwd;

	int k, N, block_size;

	if(nrhs==0)

		mexErrMsgTxt("usage: [AL] = performNFFTsonvectorX_cudadbl(A0,Nt)");

	else if(nrhs!=inputs) mexErrMsgTxt("This function requires 2 input arguments");

	else if(nlhs!=outputs) mexErrMsgTxt("This function requires 1 output arguments");

	/* Check input for dimension */

	if((mxGetN(A0_in)==1)&&(mxGetM(A0_in)==1)) mexErrMsgTxt("The input A0_dcmplxptr cannot be a scalar!\n");

	else{

		if((mxGetN(A0_in)>1)&&(mxGetM(A0_in)==1)) N=mxGetN(A0_in);

		else N=mxGetM(A0_in);

	}

	Ntransforms = (int) mxGetScalar(Nt_in);

	/* Reserve some space */

	A0_dcmplxptr = (cufftDoubleComplex*) mxMalloc(sizeof(cufftDoubleComplex) * N);

	AL_dcmplxptr = (cufftDoubleComplex*) mxMalloc(sizeof(cufftDoubleComplex) * N);

	AL_out=mxCreateDoubleMatrix(N,1,mxCOMPLEX);

	/*

	 * Assign the input to working variables

	 */

	/* Matlab Re/Im into cufftComplex array */

	pack_c2dc(A0_dcmplxptr,mxGetPr(A0_in),mxGetPi(A0_in),N);

	/* Initialize variables to the graphics card / Reserve Space */

	cutilSafeCall(cudaMalloc( (void **) &AzX_GPU_dcmplxptr,sizeof(cufftDoubleComplex)*N));

	/* Compute the execution configuration */

	block_size=128;

	dim3 dimBlock(block_size,1);

	dim3 dimGrid ( (N/dimBlock.x) + (!(N%dimBlock.x)?0:1) , 1);

	/*

	 * Initialize CUFFT

	 */

	/* Find the forward fft-plan */

	cufftSafeCall(cufftPlan1d(&p_fwd, N, CUFFT_Z2Z,1));

	cutilSafeCall(cudaMemcpy( AzX_GPU_dcmplxptr, A0_dcmplxptr, sizeof(cufftDoubleComplex)*N, cudaMemcpyHostToDevice));

	for(k=0;k<Ntransforms;k++)

	{

		cufftSafeCall(cufftExecZ2Z(p_fwd, AzX_GPU_dcmplxptr, AzX_GPU_dcmplxptr, CUFFT_FORWARD));

		cufftSafeCall(cufftExecZ2Z(p_fwd, AzX_GPU_dcmplxptr, AzX_GPU_dcmplxptr, CUFFT_INVERSE));

		inverseFFTscaling<<<dimGrid,dimBlock>>>(AzX_GPU_dcmplxptr, N);

	}

	

	/*cudaMemcpy( AL_fcmplxptr, AzX_GPU_fcmplxptr, sizeof(cufftComplex)*N, cudaMemcpyDeviceToHost);*/

	cutilSafeCall(cudaMemcpy( AL_dcmplxptr, AzX_GPU_dcmplxptr, sizeof(cufftDoubleComplex)*N, cudaMemcpyDeviceToHost));

	

	/* Outout from cufftComplex array into Matlab Re/Im format */

	unpack_c2dc(AL_dcmplxptr,mxGetPr(AL_out),mxGetPi(AL_out),N);

	/* Free resources */

	cufftSafeCall(cufftDestroy(p_fwd));

	cutilSafeCall(cudaFree(AzX_GPU_dcmplxptr));

	mxFree(A0_dcmplxptr);

	mxFree(AL_dcmplxptr);

	return;

}

brightyboy · August 18, 2009, 8:47am

One other thing I’ve tried is to use two separate functions for scaling the real/imaginary parts of the complex vector.

This does lead to the correct result.

/* Scaling both real/imag */

static __global__ void  inverseFFTscaling(cufftDoubleComplex *c_in, int N)

{

	const int idx   = threadIdx.x + blockDim.x * blockIdx.x;	

	c_in[idx].x = c_in[idx].x / N;

	c_in[idx].y = c_in[idx].y / N;

}

/* Scaling real */

static __global__ void  inverseFFTscalingRe(cufftDoubleComplex *c_in, int N)

{

	const int idx   = threadIdx.x + blockDim.x * blockIdx.x;

	c_in[idx].x = c_in[idx].x / N;

}

/* Scaling imag */

static __global__ void  inverseFFTscalingIm(cufftDoubleComplex *c_in, int N)

{

	const int idx   = threadIdx.x + blockDim.x * blockIdx.x;

	c_in[idx].y = c_in[idx].y / N;

}

There must be someone who has encountered this problem?

Any ideas would be appreciated.

Steve

mfatica · August 19, 2009, 12:27am

It seems like a compiler bug, I have replicated in a standalone application.
Thanks for reporting the problem.

A quick workaround is to change your code from
c_in[idx].x = c_in[idx].x /N ; // real
c_in[idx].y = c_in[idx].y /N; // imaginary

to:
double scale=1./N;
c_in[idx].x = c_in[idx].x *scale ; // real
c_in[idx].y = c_in[idx].y *scale; // imaginary

brightyboy · August 19, 2009, 8:40am

It seems like a compiler bug, I have replicated in a standalone application.

Thanks for reporting the problem.

A quick workaround is to change your code from
c_in[idx].x = c_in[idx].x /N ; // real

c_in[idx].y = c_in[idx].y /N; // imaginary
to:

double scale=1./N;

c_in[idx].x = c_in[idx].x *scale ; // real

c_in[idx].y = c_in[idx].y *scale; // imaginary

Thanks for the hint. I have just included your suggested modification and it works.

Best wishes,

Steve

jabelloch · March 29, 2010, 4:34pm

Hi both of them,

I am a beginner in cuda. I want to test how the double precision works.

I have written a code similiar like the code before. It only makes the FFT of a signal and then scales it.

To make the FFT, I use a vector of cufftDoubleComplex which read the arguments from a file where the values are in ascii:

cufftDoubleComplex* x_t;

FILE * fp;

x_t = (cufftDoubleComplex*)malloc(iMuestras*sizeof(cufftDoubleComp

lex));

fp = fopen(“…/datos_entrada/senal_pruebaB.txt”, “r” );

while (fgets(c, 40, fp)!=NULL)

{

       h_t[iReadValues].x=atof©;

       h_t[iReadValues].y=0;

       iReadValues++;

}

I reserve a memory in device…

… then…

// CUFFT plan

cufftHandle plan;

cufftSafeCall(cufftPlan1d(&plan, new_size, CUFFT_Z2Z, 1));

cufftSafeCall(cufftExecZ2Z(plan, (cufftDoubleComplex *)d_signal, (cufftDoubleComplex *)d_signal, CUFFT_FORWARD));

I write the Makefile like:

Add source files here

EXECUTABLE := test

CUDA source files (compiled with cudacc)

CUFILES := test.cu

FLAGS para compilar en doble precision

CFLAGS=‘-arch sm_13’

C/C++ source files (compiled with gcc / c++)

CCFILES :=

Additional libraries needed by the project

USECUFFT := 1

I compile, with MAKE, and it compiles perfectly and the applications makes FFT perfectly.

HOWEVER, when I intruce the function scale:

main:

int block_size=128;

dim3 dimBlock(block_size,1);

dim3 dimGrid ( (new_size/dimBlock.x) + (!(new_size%dimBlock.x)?0:1) , 1);

// Scale the result

ComplexPointwiseMulAndScale<<<dimGrid, dimBlock>>>(d_signal, 1.0f/new_size);

Function:

// Scale FFT values

static global void ComplexPointwiseMulAndScale(cufftDoubleComplex* a, double scale)

{

const int threadID = blockIdx.x * blockDim.x + threadIdx.x;

a[threadID].x =  a[threadID].x *scale;

a[threadID].y =  a[threadID].y *scale;

}

When I compile, The compiler says:

ptxas /tmp/tmpxft_00004bcb_00000000-2_convolutionOVS.ptx, line 70; warning : Double is not supported. Demoting to float

and the application does not work.

I am using a TESLA C0160 with Capacity 1.3, CUDA 2.3, It should work, shouldn’t it ???. What should I change?

Thank you very much,

jabelloch

Topic		Replies	Views
Questions about cuFFT for 3D matrix, arrayFire GPU-Accelerated Libraries	5	1670	October 12, 2021
accuracy of CUFFT under double precision CUDA Programming and Performance	9	4150	September 18, 2009
cuFFT and fftw CUDA Programming and Performance	10	4182	August 25, 2010
cufft question CUDA Programming and Performance	6	8664	March 9, 2009
floats to Cufft complex data type CUDA Programming and Performance	13	11928	November 12, 2014
Image filtering in frequency domain GPU-Accelerated Libraries	6	3641	October 3, 2014
CUDA FFT different from Matlab FFT CUDA Programming and Performance	32	9349	March 29, 2011
Bad Performance of CUFFT library? compilation flags for optimizing fft performance CUDA Programming and Performance	11	13511	February 17, 2012
CUFFT run wrong CUDA Programming and Performance	16	2819	May 23, 2013
2D CUFFT wrong result GPU-Accelerated Libraries cufft	8	3109	November 7, 2023

Simple complex function: Error on imaginary component

Add source files here

CUDA source files (compiled with cudacc)

FLAGS para compilar en doble precision

C/C++ source files (compiled with gcc / c++)

Additional libraries needed by the project

Related topics