floats to Cufft complex data type

cufft complex data type

I have 2 data sets real and imaginary in float type

i want to assign these to cufftcomplex … How to do that?

  1. How to access real part and imaginary part from cufftComplex data… data.x and data.y did nt work for me.

  2. if i form a struct complex of float real, float img and try to assign it to cufftComplex will it work?

  3. what is relation among cufftComplex and float2

this stackoverflow question seems related

Referring to the header files that come with CUDA shows that cufftComplex is a float2:

In cufft.h:

// cufftComplex is a single-precision, floating-point complex data type that
// consists of interleaved real and imaginary components.
// cufftDoubleComplex is the double-precision equivalent.
typedef cuComplex cufftComplex;
typedef cuDoubleComplex cufftDoubleComplex;

In cuComplex.h:

typedef float2 cuFloatComplex;
typedef cuFloatComplex cuComplex;
typedef double2 cuDoubleComplex;

As for your question (2), the following works without issues for me:

#include <stdio.h>
#include <stdlib.h>
#include "cufft.h"

int main (void)
{
    cufftComplex foo = make_cuComplex (0.0f, 1.0f);
    printf ("real part = %15.8e  imag part = %15.8e\n", foo.x, foo.y);
    return EXIT_SUCCESS;
}

Hello

int N=10000;

float *real ;

real= (float *) malloc(sizeof(float)*N);

float *img;

img = (float *) malloc(sizeof(float)*N);

cufftComplex *data;

cudaMalloc((void**)&data, sizeof(cufftComplex) * N);

for(int ii=0; ii<N; ii++)
data[ii].x= realii];
dataiii.y= img[[ii];

It did not work for me
Can you help me to put real part to data.x real and img to data.y

“data” is allocated on the GPU. “real” and “imag” are allocated on the host. You need to copy the data from the host to the GPU. In #2 above cbuchner1 gave a relevant link, at which one possible approach is shown.

Hi Njuffa,
that i understand …

I am calling cufftplan1d and cufftExecC2C after this…which will take care data xfer i guess.

what i need to do is to put the real and img part in to complex first because my data type is in two sets…

please help me in this direction.

I have not used CUFFT, but by analogy with other CUDA-based libraries, the programmer is responsible for moving all relevant data to the GPU prior to invoking library functions.

I understand that you have two separate arrays on the host, one containing the real component and the other containing the imaginary component (SOA = structure of arrays layout). On the device you need the real and imaginary components interleaved (AOS = array of structures layout) for CUFFT.

The code at the link given by cbuchner1 addresses exactly this scenario: The data is copied from SOA layout on the host to AOS layout on the device.

One more thing

The code at link do cudamemcpy2d…

DO i require 2d Memcpy ?

Can not i used cudamemcpy??

In the example code on Stackoverflow, cudaMemcpy2D() provides a strided copy on the destination side. The two offset, strided, copies thus interleave the real and imaginary parts in GPU memory. Simple cudaMemcpy() provides a contiguous, non-strided, copy operation.

Ok thank you all

will try in the morning …

hope it will work gud for me…

Hi All there

i have tried using that , not fruitful for me…

int n =10;
float * real ;
float * imag;
real= (float *) malloc (sizeof(float) *N);
imag= (float *) malloc (size of (float ) *N);

for (int ii=0; ii<N; ii++){
real[ii] = (float) ii*ii;
img[ii]= (float) ii+ii;
}

cuffftComplex *data;

cudaMalloc((void **) &data , sizeof(cufftComplex) *N )

////////////// Now can anyone help me to assign real and imag value respectively to complex data type

#include<cuda_runtime.h>
#include<cufft.h>
#include<cufftw.h>
#include<stdlib.h>
#include<stdio.h>
#include<cublas_v2.h>
main()
{

int N=10000;

float * real_vec;       // host vector, real part
float * imag_vec;       // host vector, imaginary part
float * resultReal;

real_vec= (float *) malloc ( sizeof(float) * N ) ;
imag_vec= (float *) malloc ( sizeof(float) * N ) ;
resultReal= (float *) malloc ( sizeof(float) * N ) ;


for(int ii=1; ii<=N ; ii++){
	real_vec[ii]= (float) ii*ii;
	imag_vec[ii]= (float) ii+ii;
}


float2 * complex_vec_d; // device vector, single-precision complex

cudaMalloc((void **) &complex_vec_d, sizeof(float2) * N);


	if (cudaGetLastError() != cudaSuccess){
		fprintf(stderr, "Cuda error: Failed to allocate\n");
	return;
	}


cudaMemcpy2D (complex_vec_d, 2 * sizeof(complex_vec_d), 
                         real_vec, 1 * sizeof(real_vec),
                         sizeof(real_vec), N, cudaMemcpyHostToDevice);
cudaMemcpy2D (complex_vec_d + 1, 2 * sizeof(complex_vec_d),
                         imag_vec, 1 * sizeof(imag_vec),
                         sizeof(imag_vec), N, cudaMemcpyHostToDevice);

	cufftHandle plan;

	if (cufftPlan1d(&plan, N, CUFFT_C2C, 1) != CUFFT_SUCCESS){
		fprintf(stderr, "CUFFT error: Plan creation failed");
		return;
	}
	
	
	if (cufftExecC2C(plan, complex_vec_d, complex_vec_d, CUFFT_FORWARD) != CUFFT_SUCCESS){
		fprintf(stderr, "CUFFT error: ExecC2C Forward failed");
		return;
	}	

	if (cudaThreadSynchronize() != cudaSuccess){
		fprintf(stderr, "Cuda error: Failed to synchronize\n");
	return;
	}
	
	cudaMemcpy2D (resultReal, 1 * sizeof(resultReal),
                         complex_vec_d, 2 * sizeof(complex_vec_d),
                         sizeof(complex_vec_d), N, cudaMemcpyDeviceToHost);

	for(int ii=0; ii<5 ; ii++)
	printf ( " Org val %f \t fftval:  %f\n", real_vec[ii], resultReal[ii]);

}

Can anyone help me to figure out the problem … I need to get the fft using cufft …

I propose two solutions to your problem and I recommend the first one if there is no good reason to choose the second one.

The first one uses a cufftComplex type already on the host. Then copying to device is easy and you will also easily understand what you are doing…

Solution 1:

#include <cufft.h>
#include <stdio.h>

int main()
{
	int N=10;

	cufftComplex* data;
	data = (cufftComplex *) malloc ( sizeof(cufftComplex) * N ) ;

	cufftComplex* dData;
	cudaMalloc((void **) &dData, sizeof(cufftComplex) * N);
	if (cudaGetLastError() != cudaSuccess)
	{
		fprintf(stderr, "Cuda error: Failed to allocate\n");
		return -1;
	}

	for(int ii=0; ii < N ; ii++)
	{
		data[ii].x= sinpi( .9*(float)ii/(float)N);
		data[ii].y= cospi( (float)ii/(float)N);
	}

	printf( "Org vals: \n");
	for(int ii=0; ii<N ; ii++)
	{
		printf ( "%f+i*%f\n", data[ii].x,data[ii].y );
	}

	cudaMemcpy( dData, data, sizeof(cufftComplex)*N, cudaMemcpyHostToDevice );

	cufftHandle plan;

	if (cufftPlan1d(&plan, N, CUFFT_C2C, 1) != CUFFT_SUCCESS){
	fprintf(stderr, "CUFFT error: Plan creation failed");
	return -1;
	}

	if (cufftExecC2C(plan, dData, dData, CUFFT_FORWARD) != CUFFT_SUCCESS){
	fprintf(stderr, "CUFFT error: ExecC2C Forward failed");
	return -1;
	}

	if (cudaThreadSynchronize() != cudaSuccess){
	fprintf(stderr, "Cuda error: Failed to synchronize\n");
	return -1;
	}

	cudaMemcpy( data, dData, sizeof(cufftComplex)*N, cudaMemcpyDeviceToHost );

	printf( "fft vals: \n");
	for(int ii=0; ii<N ; ii++)
	{
		printf ( "%f+i*%f\n", data[ii].x,data[ii].y );
	}
}

The second solution uses your original code and fixes your memcpys. I do not recommend to use this because for me it looks more like a hack… I like code that is easy to understand…

Solution 2:

#include <cufft.h>
#include <stdio.h>

int main() {

	int N = 10;

	float * real_vec; // host vector, real part
	float * imag_vec; // host vector, imaginary part
	float * resultReal;

	real_vec = (float *) malloc(sizeof(float) * N);
	imag_vec = (float *) malloc(sizeof(float) * N);
	resultReal = (float *) malloc(sizeof(float) * N);

	for (int ii = 0; ii < N; ii++)
	{
		real_vec[ii] = sinpi( .9*(float)ii/(float)N);
		imag_vec[ii] = cospi( (float)ii/(float)N);
	}

	float2 * complex_vec_d; // device vector, single-precision complex

	cudaMalloc((void **) &complex_vec_d, sizeof(float2) * N);

	if (cudaGetLastError() != cudaSuccess)
	{
		fprintf(stderr, "Cuda error: Failed to allocate\n");
		return -1;
	}

	cudaMemcpy2D(complex_vec_d, 2 * sizeof(float), real_vec, 1 * sizeof(float), sizeof(float), N, cudaMemcpyHostToDevice);
	cudaMemcpy2D(&complex_vec_d[0].y, 2 * sizeof(float), imag_vec, 1 * sizeof(float), sizeof(float), N, cudaMemcpyHostToDevice);

	cufftHandle plan;

	if (cufftPlan1d(&plan, N, CUFFT_C2C, 1) != CUFFT_SUCCESS) {
		fprintf(stderr, "CUFFT error: Plan creation failed");
		return -1;
	}

	if (cufftExecC2C(plan, complex_vec_d, complex_vec_d, CUFFT_FORWARD)
			!= CUFFT_SUCCESS)
	{
		fprintf(stderr, "CUFFT error: ExecC2C Forward failed");
		return -1;
	}

	if (cudaThreadSynchronize() != cudaSuccess)
	{
		fprintf(stderr, "Cuda error: Failed to synchronize\n");
		return -1;
	}

	cudaMemcpy2D(resultReal, 1 * sizeof(float), complex_vec_d,
			2 * sizeof(float), sizeof(float), N,
			cudaMemcpyDeviceToHost);

	printf("fftvals:\n");
	for (int ii = 0; ii < N; ii++)
	{
		printf("%f\n", resultReal[ii]);
	}
}

Basically it fixes the cudaMemcpy2D operations:

  1. “complex_vec_d + 1” points to the next float2 not the imaginary part…
  2. you pass the size of pointers where you should pass the size of floats

thanks for the reply…

i will try out these approaches also.

i typedef cufftComplex float2

and used first approach earlier . it also worked out for me.