floats to Cufft complex data type

jaisingla · November 11, 2014, 5:29pm

cufft complex data type

I have 2 data sets real and imaginary in float type

i want to assign these to cufftcomplex … How to do that?

How to access real part and imaginary part from cufftComplex data… data.x and data.y did nt work for me.
if i form a struct complex of float real, float img and try to assign it to cufftComplex will it work?
what is relation among cufftComplex and float2

cbuchner1 · November 11, 2014, 5:44pm

this stackoverflow question seems related
http://stackoverflow.com/questions/13535182/copying-data-to-cufftcomplex-data-struct

njuffa · November 11, 2014, 5:53pm

Referring to the header files that come with CUDA shows that cufftComplex is a float2:

In cufft.h:

// cufftComplex is a single-precision, floating-point complex data type that
// consists of interleaved real and imaginary components.
// cufftDoubleComplex is the double-precision equivalent.
typedef cuComplex cufftComplex;
typedef cuDoubleComplex cufftDoubleComplex;

In cuComplex.h:

typedef float2 cuFloatComplex;
typedef cuFloatComplex cuComplex;
typedef double2 cuDoubleComplex;

As for your question (2), the following works without issues for me:

#include <stdio.h>
#include <stdlib.h>
#include "cufft.h"

int main (void)
{
    cufftComplex foo = make_cuComplex (0.0f, 1.0f);
    printf ("real part = %15.8e  imag part = %15.8e\n", foo.x, foo.y);
    return EXIT_SUCCESS;
}

jaisingla · November 11, 2014, 6:06pm

Hello

int N=10000;

float *real ;

real= (float *) malloc(sizeof(float)*N);

float *img;

img = (float *) malloc(sizeof(float)*N);

cufftComplex *data;

cudaMalloc((void**)&data, sizeof(cufftComplex) * N);

for(int ii=0; ii<N; ii++)
data[ii].x= realii];
dataiii.y= img[[ii];

It did not work for me
Can you help me to put real part to data.x real and img to data.y

njuffa · November 11, 2014, 6:10pm

“data” is allocated on the GPU. “real” and “imag” are allocated on the host. You need to copy the data from the host to the GPU. In #2 above cbuchner1 gave a relevant link, at which one possible approach is shown.

jaisingla · November 11, 2014, 6:14pm

Hi Njuffa,
that i understand …

I am calling cufftplan1d and cufftExecC2C after this…which will take care data xfer i guess.

what i need to do is to put the real and img part in to complex first because my data type is in two sets…

please help me in this direction.

njuffa · November 11, 2014, 6:20pm

I have not used CUFFT, but by analogy with other CUDA-based libraries, the programmer is responsible for moving all relevant data to the GPU prior to invoking library functions.

I understand that you have two separate arrays on the host, one containing the real component and the other containing the imaginary component (SOA = structure of arrays layout). On the device you need the real and imaginary components interleaved (AOS = array of structures layout) for CUFFT.

The code at the link given by cbuchner1 addresses exactly this scenario: The data is copied from SOA layout on the host to AOS layout on the device.

jaisingla · November 11, 2014, 6:33pm

One more thing

The code at link do cudamemcpy2d…

DO i require 2d Memcpy ?

Can not i used cudamemcpy??

njuffa · November 11, 2014, 6:40pm

In the example code on Stackoverflow, cudaMemcpy2D() provides a strided copy on the destination side. The two offset, strided, copies thus interleave the real and imaginary parts in GPU memory. Simple cudaMemcpy() provides a contiguous, non-strided, copy operation.

jaisingla · November 11, 2014, 6:43pm

Ok thank you all

will try in the morning …

hope it will work gud for me…

jaisingla · November 12, 2014, 4:46am

Hi All there

i have tried using that , not fruitful for me…

int n =10;
float * real ;
float * imag;
real= (float *) malloc (sizeof(float) *N);
imag= (float *) malloc (size of (float ) *N);

for (int ii=0; ii<N; ii++){
real[ii] = (float) ii*ii;
img[ii]= (float) ii+ii;
}

cuffftComplex *data;

cudaMalloc((void **) &data , sizeof(cufftComplex) *N )

////////////// Now can anyone help me to assign real and imag value respectively to complex data type

jaisingla · November 12, 2014, 6:17am

#include<cuda_runtime.h>
#include<cufft.h>
#include<cufftw.h>
#include<stdlib.h>
#include<stdio.h>
#include<cublas_v2.h>
main()
{

int N=10000;

float * real_vec;       // host vector, real part
float * imag_vec;       // host vector, imaginary part
float * resultReal;

real_vec= (float *) malloc ( sizeof(float) * N ) ;
imag_vec= (float *) malloc ( sizeof(float) * N ) ;
resultReal= (float *) malloc ( sizeof(float) * N ) ;


for(int ii=1; ii<=N ; ii++){
	real_vec[ii]= (float) ii*ii;
	imag_vec[ii]= (float) ii+ii;
}


float2 * complex_vec_d; // device vector, single-precision complex

cudaMalloc((void **) &complex_vec_d, sizeof(float2) * N);


	if (cudaGetLastError() != cudaSuccess){
		fprintf(stderr, "Cuda error: Failed to allocate\n");
	return;
	}


cudaMemcpy2D (complex_vec_d, 2 * sizeof(complex_vec_d), 
                         real_vec, 1 * sizeof(real_vec),
                         sizeof(real_vec), N, cudaMemcpyHostToDevice);
cudaMemcpy2D (complex_vec_d + 1, 2 * sizeof(complex_vec_d),
                         imag_vec, 1 * sizeof(imag_vec),
                         sizeof(imag_vec), N, cudaMemcpyHostToDevice);

	cufftHandle plan;

	if (cufftPlan1d(&plan, N, CUFFT_C2C, 1) != CUFFT_SUCCESS){
		fprintf(stderr, "CUFFT error: Plan creation failed");
		return;
	}
	
	
	if (cufftExecC2C(plan, complex_vec_d, complex_vec_d, CUFFT_FORWARD) != CUFFT_SUCCESS){
		fprintf(stderr, "CUFFT error: ExecC2C Forward failed");
		return;
	}	

	if (cudaThreadSynchronize() != cudaSuccess){
		fprintf(stderr, "Cuda error: Failed to synchronize\n");
	return;
	}
	
	cudaMemcpy2D (resultReal, 1 * sizeof(resultReal),
                         complex_vec_d, 2 * sizeof(complex_vec_d),
                         sizeof(complex_vec_d), N, cudaMemcpyDeviceToHost);

	for(int ii=0; ii<5 ; ii++)
	printf ( " Org val %f \t fftval:  %f\n", real_vec[ii], resultReal[ii]);

}

Can anyone help me to figure out the problem … I need to get the fft using cufft …

hadschi118 · November 12, 2014, 10:21am

I propose two solutions to your problem and I recommend the first one if there is no good reason to choose the second one.

The first one uses a cufftComplex type already on the host. Then copying to device is easy and you will also easily understand what you are doing…

Solution 1:

#include <cufft.h>
#include <stdio.h>

int main()
{
	int N=10;

	cufftComplex* data;
	data = (cufftComplex *) malloc ( sizeof(cufftComplex) * N ) ;

	cufftComplex* dData;
	cudaMalloc((void **) &dData, sizeof(cufftComplex) * N);
	if (cudaGetLastError() != cudaSuccess)
	{
		fprintf(stderr, "Cuda error: Failed to allocate\n");
		return -1;
	}

	for(int ii=0; ii < N ; ii++)
	{
		data[ii].x= sinpi( .9*(float)ii/(float)N);
		data[ii].y= cospi( (float)ii/(float)N);
	}

	printf( "Org vals: \n");
	for(int ii=0; ii<N ; ii++)
	{
		printf ( "%f+i*%f\n", data[ii].x,data[ii].y );
	}

	cudaMemcpy( dData, data, sizeof(cufftComplex)*N, cudaMemcpyHostToDevice );

	cufftHandle plan;

	if (cufftPlan1d(&plan, N, CUFFT_C2C, 1) != CUFFT_SUCCESS){
	fprintf(stderr, "CUFFT error: Plan creation failed");
	return -1;
	}

	if (cufftExecC2C(plan, dData, dData, CUFFT_FORWARD) != CUFFT_SUCCESS){
	fprintf(stderr, "CUFFT error: ExecC2C Forward failed");
	return -1;
	}

	if (cudaThreadSynchronize() != cudaSuccess){
	fprintf(stderr, "Cuda error: Failed to synchronize\n");
	return -1;
	}

	cudaMemcpy( data, dData, sizeof(cufftComplex)*N, cudaMemcpyDeviceToHost );

	printf( "fft vals: \n");
	for(int ii=0; ii<N ; ii++)
	{
		printf ( "%f+i*%f\n", data[ii].x,data[ii].y );
	}
}

The second solution uses your original code and fixes your memcpys. I do not recommend to use this because for me it looks more like a hack… I like code that is easy to understand…

Solution 2:

#include <cufft.h>
#include <stdio.h>

int main() {

	int N = 10;

	float * real_vec; // host vector, real part
	float * imag_vec; // host vector, imaginary part
	float * resultReal;

	real_vec = (float *) malloc(sizeof(float) * N);
	imag_vec = (float *) malloc(sizeof(float) * N);
	resultReal = (float *) malloc(sizeof(float) * N);

	for (int ii = 0; ii < N; ii++)
	{
		real_vec[ii] = sinpi( .9*(float)ii/(float)N);
		imag_vec[ii] = cospi( (float)ii/(float)N);
	}

	float2 * complex_vec_d; // device vector, single-precision complex

	cudaMalloc((void **) &complex_vec_d, sizeof(float2) * N);

	if (cudaGetLastError() != cudaSuccess)
	{
		fprintf(stderr, "Cuda error: Failed to allocate\n");
		return -1;
	}

	cudaMemcpy2D(complex_vec_d, 2 * sizeof(float), real_vec, 1 * sizeof(float), sizeof(float), N, cudaMemcpyHostToDevice);
	cudaMemcpy2D(&complex_vec_d[0].y, 2 * sizeof(float), imag_vec, 1 * sizeof(float), sizeof(float), N, cudaMemcpyHostToDevice);

	cufftHandle plan;

	if (cufftPlan1d(&plan, N, CUFFT_C2C, 1) != CUFFT_SUCCESS) {
		fprintf(stderr, "CUFFT error: Plan creation failed");
		return -1;
	}

	if (cufftExecC2C(plan, complex_vec_d, complex_vec_d, CUFFT_FORWARD)
			!= CUFFT_SUCCESS)
	{
		fprintf(stderr, "CUFFT error: ExecC2C Forward failed");
		return -1;
	}

	if (cudaThreadSynchronize() != cudaSuccess)
	{
		fprintf(stderr, "Cuda error: Failed to synchronize\n");
		return -1;
	}

	cudaMemcpy2D(resultReal, 1 * sizeof(float), complex_vec_d,
			2 * sizeof(float), sizeof(float), N,
			cudaMemcpyDeviceToHost);

	printf("fftvals:\n");
	for (int ii = 0; ii < N; ii++)
	{
		printf("%f\n", resultReal[ii]);
	}
}

Basically it fixes the cudaMemcpy2D operations:

“complex_vec_d + 1” points to the next float2 not the imaginary part…
you pass the size of pointers where you should pass the size of floats

jaisingla · November 12, 2014, 5:08pm

thanks for the reply…

i will try out these approaches also.

i typedef cufftComplex float2

and used first approach earlier . it also worked out for me.