Time loss...

Komtuveuh · March 28, 2012, 6:50am

Hello,

My boss wanted me to benchmark the FFT running on a CUDA GPU, to know if it can be used for a real-time application (latency < 1ms).

I did it, and found out, using the NVIDIA Visual Profiler, that there is a huge blank in the timeline I can’t identify.
This blank lasts 100us and the FFT process (with the memory copy) lasts only 50us.

This blank occurs only before the call of the FFT function (cufftExecC2C), and after the Host to Device Memory copy.

Here’s a screenshot of the timeline :

Uploaded with ImageShack.us

PS : What I think is that this is due to the call of an “external” library function, but can’t be sure.

Thank you

Komtuveuh · March 29, 2012, 10:50am

Maybe I wasn’t accurate enough.

I’m currently looking for a way to reduce this blank (if such a way exists).

Here’s what’s happening during this time :

[*]cudaPeekAtLastError

[*]cudaSetupArgument

[*]cudaLaunch

The longest is the last one.

Is there any possibility to decrease the process time of this FFT process ?

tera · March 29, 2012, 10:59am

Before you invest too much time tackling this problem, make sure the gap exists if your program is run without the profiler…

Komtuveuh · March 29, 2012, 12:06pm

I’ve tested it without the profiler (with cudaEvent and others process time display functions), same “issue” (maybe it’s normal, but I’d like to be sure)…

In fact, I used the profiler when I found out the process time was higher than expected (I expected maybe too much External Image ).

To be clearer, here’s the simple process I measure :

“cudaMemcpy
cufftExecC2C
cudaMemcpy”

Thank you for the answer.

Topic		Replies	Views
FFT Computation Timing constraint on GPU. CUDA Programming and Performance	0	705	August 22, 2014
Analysis of CUDA Visual Profiler Output CUDA Programming and Performance	2	1867	October 6, 2008
CUFFT cudaMemCpyDeviceToHost first call is slow CUDA Programming and Performance	3	788	July 1, 2019
CUFFT performance not good How to correctly find the excution time on CPU and GPU CUDA Programming and Performance	1	1023	May 4, 2010
Reducing GPU Idle Time CUDA Programming and Performance	19	4396	June 14, 2022
cuda visual profiler selecting the counters CUDA Programming and Performance	3	2742	April 16, 2009
cufft doubt comparing r2c and c2c 2D FFTs CUDA Programming and Performance	28	13479	October 27, 2010
check GPU usage CUDA Programming and Performance	5	3628	September 22, 2009
Comparing cuda fft and matlab fft CUDA Programming and Performance	5	6140	February 10, 2008
Execution time is different in Profiller and Console. why? CUDA Programming and Performance	4	3742	August 3, 2009

Time loss...

Related topics