CUFFT Newbei Question

gpuguy · May 3, 2010, 2:41pm

Hello

I wrote the following code utilizing CUFFT Library for calculating FFT of 256 numbers and in 10 Batches. I have following questions in this regard:

1- Am I using the correct way for calculating the elapsed time for cufftExecC2C ( Basically I am using CUDA Event) ?

2- To calculate the execution time on the CPU, I am simple running the program in emulation mode -deviceemu. Based on the method of calculating the execution time I am using, I am observing that there is no benefit in performing the FFT on the GPU. I checked my results by varying the number of batches from 10 to 6000, and the benefit that I am observing is almost negligible; Am I doing something wrong while calculating the timing?

(I am using CUFFT for the first time, and even in CUDA I am relatively new, so please help me understand it)

Thanks in advance.

#include <stdio.h>

#include <math.h>

#include <cuda.h>

#include <cuda_runtime.h>

#include <cufft.h>

#include <cuda.h>

#define NX	  256

#define BATCH   10

int main()

{

		cufftHandle plan;

		cufftComplex *devPtr;

		cufftComplex data[NX*BATCH];

		int i;

																																																													/* source data creation */

		for(i=  0; i < NX*BATCH; i++){

				data[i].x = 1.0f;

				data[i].y = 1.0f;

		}

cudaEvent_t start,stop;

float time;

cudaEventCreate(&start);

cudaEventCreate(&stop);

	/* GPU memory allocation */

		cudaMalloc((void**)&devPtr, sizeof(cufftComplex)*NX*BATCH);

	/* transfer to GPU memory */

		cudaMemcpy(devPtr, data, sizeof(cufftComplex)*NX*BATCH, cudaMemcpyHostToDevice);

		/* creates 1D FFT plan */

		cufftPlan1d(&plan, NX, CUFFT_C2C, BATCH);

/* Timing Calculations*/

cudaEventRecord( start, 0 );

		/* executes FFT processes */

		cufftExecC2C(plan, devPtr, devPtr, CUFFT_FORWARD);

cudaThreadSynchronize();

cudaEventRecord( stop , 0 );

cudaEventSynchronize( stop );

float elapsedTime;

cudaEventElapsedTime( &elapsedTime, start, stop );

printf("Processing time=%f(ms)\n",elapsedTime);

cudaEventDestroy( start );  

cudaEventDestroy( stop );

		   /* transfer results from GPU memory */

		cudaMemcpy(data, devPtr, sizeof(cufftComplex)*NX*BATCH, cudaMemcpyDeviceToHost);

		/* deletes CUFFT plan */

		cufftDestroy(plan);

	/* frees GPU memory */

		cudaFree(devPtr);

		/*for(i = 0; i < NX*BATCH; i++){

				printf("data[%d] %f %f\n", i, data[i].x, data[i].y);

		}*/

		return 0;

}

ONeill · May 4, 2010, 9:28am

Hi!

using cudaEvents for measuring time is ok
A size of 256 elements per FFT is quite small for testing the GPUs performance. Try running tests with N ranging from 256 up to 128k. For comparison with CPU I use the FFTW library (CUFFT is using a model very similar to FFTWs one so writing code for FFTW isnt very different). Running CUDA code in emu mode isnt a good idea cause all CUDA threads will be emulated by host threads then. I can see some improvements when comparing CUFFT with FFTW up from N=32k or less when setting batch to 20 or higher.

Topic		Replies	Views
CUFFT performance not good How to correctly find the excution time on CPU and GPU CUDA Programming and Performance	1	1066	May 4, 2010
Measuring time. CUDA Programming and Performance	1	1550	April 2, 2009
cuFFT library Question on cufftExecC2C() behavior CUDA Programming and Performance	0	1129	January 25, 2011
optimizing FFT calculation? CUDA Programming and Performance	8	6603	May 26, 2008
FFT Computation Timing constraint on GPU. CUDA Programming and Performance	0	733	August 22, 2014
cuFFT Timing Jetson TX2	14	2565	October 18, 2021
Performance of CuFFT 3.1 library CUDA Programming and Performance	0	3293	July 8, 2011
Benchmarking Paricular Sized CUFFT I have a CUFFT, and I can't seem to get anywhere near optimal CUDA Programming and Performance	0	2236	April 27, 2009
CUFFT Question? Confusing CUFFT times CUDA Programming and Performance	2	1753	January 23, 2009
theoretical time for FFT 1k CUDA Programming and Performance	1	2136	April 8, 2008

CUFFT Newbei Question

Related topics