Negative timer values in 0.9

pluk · June 20, 2007, 2:12pm

Hello,

I am running into a strange problem with the 0.9 release of CUDA, which I haven’t seen in 0.8. I am using a GeForce 8800 GTX and driver version 162.01 with VS03 under Windows XP.

I am timing a number of FFT and IFFT’s from the CUFFT library with the code below. With the use of a simple batchfile, this is performed for several data sizes (2x2, …, 1024x1024) to compare CUFFT with the FFTW library.

In the 0.8 release of CUDA and CUFFT everything worked fine, but in the 0.9 release I occasionally get negative timing values! Could this be a bug in the new CUDA release or am I doing something wrong?

//start actual calculation

CUT_SAFE_CALL(cutStartTimer(timer));

// Allocate device memory for signal

float* d_signal;

float* d_signal_f;

CUDA_SAFE_CALL(cudaMalloc((void**)&d_signal, mem_size));

CUDA_SAFE_CALL(cudaMalloc((void**)&d_signal_f, mem_size));

// Copy host memory to device

CUDA_SAFE_CALL(cudaMemcpy(d_signal_f, h_signal, mem_size_f, cudaMemcpyHostToDevice));

if (number >= 1)

{

	for (unsigned int i = 0; i < number; ++i)

	{

  // Transform signal and kernel

  CUFFT_SAFE_CALL(cufftExecR2C(plan_r2c, (cufftReal *)d_signal_f, (cufftComplex *)d_signal));

 // Transform signal back

  CUFFT_SAFE_CALL(cufftExecC2R(plan_c2r, (cufftComplex *)d_signal, (cufftReal *)d_signal_f));

	}

}

// Copy device memory to host

CUDA_SAFE_CALL(cudaMemcpy(h_signal, d_signal_f, mem_size, cudaMemcpyDeviceToHost));

CUT_SAFE_CALL(cutStopTimer(timer));

printf("%d FFT of %d processing time : %f (ms)\n", number, usedw, cutGetTimerValue(timer));

Thanks for the help,

Arno

Edit: added type of videocard, driver version, development environment and OS

osiris1 · June 20, 2007, 10:57pm

What is probably causing your problem is that Nvidia have snuck a left shift of the timer register value into the code generated by ptxas in 0.9 and above so that the clock now wraps around twice as fast as it used to. Was close to 6 seconds and now 2.9 secs and so neg values occur for any measurements above 1.4 secs.
Eric

pluk · June 21, 2007, 7:31am

Ok, but I’m reading values within the range 0.3 to 500 ms so I’m not even close to 1.4 seconds.

Some typical measurements:

number of FFT's   computation time (ms)

100               0.69

100               0.70

100               0.71

100               0.70

100               0.70

100               0.71

100               0.70

100               0.71

100               0.71

100               0.71

101               1.07

101               1.06

101               1.08

101               1.08

101               1.08

101               1.06

101               1.06

101               1.08

101               1.08

101               1.08

102               0.73

102               0.73

102               0.72

102               0.72

102               0.73

102               0.74

102               0.73

102               0.73

102               0.74

102               0.73

103               1.09

103               1.09

103               1.09

103               1.09

103               1.10

103               1.09

103               -32.04

103               1.09

103               1.10

103               1.09

104               0.72

104               -32.64

104               0.71

104               0.72

104               0.72

104               0.71

104               0.70

104               -32.62

104               0.71

104               0.71

105               0.72

105               0.73

105               34.13

105               0.73

105               0.72

105               0.72

105               0.72

105               0.73

105               0.72

105               0.72

pluk · June 21, 2007, 2:11pm

Update: CUDA uses the QueryPerformanceCounter under windows. This doesn’t work flawless on multicore systems (see [url=“http://support.microsoft.com//kb/896256”]http://support.microsoft.com//kb/896256[/url] ). The update provided on that page helped for me. It reduced the problem significantly, but it wasn’t completely eliminated.

Topic		Replies	Views
FFT Computation Timing constraint on GPU. CUDA Programming and Performance	0	708	August 22, 2014
Estimating FFT Performance CUDA Programming and Performance	9	1549	June 4, 2010
CUFFT issue CUDA Programming and Performance	0	1114	December 29, 2009
Bad Performance of CUFFT library? compilation flags for optimizing fft performance CUDA Programming and Performance	11	13506	February 17, 2012
Comparing cuda fft and matlab fft CUDA Programming and Performance	5	6173	February 10, 2008
cudaMemcpyAsync not behaving asynchronously CUDA Programming and Performance	5	2452	July 4, 2008
CUFFT Newbei Question CUDA Programming and Performance	1	2901	May 4, 2010
cuFFT performance decrease with CUDA 7.0 CUDA Programming and Performance	3	1268	March 27, 2015
2D FFT- Real to Complex and Complex to Real CUDA Programming and Performance	0	2176	February 3, 2009
CUFFT performance not good How to correctly find the excution time on CPU and GPU CUDA Programming and Performance	1	1027	May 4, 2010

Negative timer values in 0.9

Related topics