GTX590 latency issues

Hi everyone,

I am currently working with a GTX590 on linux Centos5, and I have latency issues with my cuda programs. Firstly I will explain the problem I have and then I will provide you a simple example that shows the issue. If any GTX590 owner ever had this kind of problem and found a cause and/or a solution, I would be very interested to hear it.

1) The Problem :

While measuring the processing time of one of my programs (to put it simply : it computes the mean value of all the pixels of an image), I found out that the cuda process was sometimes taking a lot of time. When measuring time, there are latencies that appear quite randomly.

The first thing I did was to try the same program with other cards (GTX460, GTX480, GTX560Ti) and there wasn’t any problem at all => so far I only have this issue with the GTX590.

I also know this issue occured with another GTX590.

[u]

  1. My config :[/u]

OS : linux Centos5 (2.6.18-274.el5)

CPU: Core i7 950 (@3,3GHz)

MB : Asus Rampage 3

Pow: corsair 850W

Graphic card : GTX590

GPU0 bios : 70.10.42.00.02

GPU1 bios : 70.10.42.00.02

Nvidia driver : 280.13

Cuda v4.0

3) A simple example :

My issue can be seen with a modified version of the reduction program of the NVIDIA SDK.

The ‘reduction’ program is quite similar to my own program so that makes a perfect basis. However, keep in mind that my problem occured on all the programs I tried so far (my own convolution program, the NPP convolution example, etc …).

For my example I made some quick modifications to the NVIDIA reduction file :

The NVIDIA reduction programs perform 100 iterations of the kernel <EDIT: ‘reduction’ algorithm>, measures the global time and then divide it to have the mean time of process. The averaging prevents to see the latencies of the process …

… so I made a modified version of the program (see attached .cpp files) which computes and measure the time of every single call to the kernel <EDIT: ‘reduction’ algorithm> (maximum time and minimum time are checked/recorded after each call, and displayed at the end). My loop is 1000 iterations long to be sure the latency occur (as I said, it happens randomly)

I provided as an attachment (.png file) a quick summary of the nvidia SDK reduction program, and my modified version.

Here are some of the results I get :

-GTX590 : minTime = 0.524 ms

        maxTime = 2.307 ms

-GTX560Ti : minTime = 0.665 ms

        maxTime = 0.697 ms

In a nutshell, with the GTX 590 I have 90~95% of the results that ‘roughly’ equals the minimum time (very slight difference), and the remaining results are ~2ms higher than the minimum time. Those ‘peaks’ appear quite randomly along the process.

4) My questions :

Is there someone who had this issue ? If that’s the case, did you find any solution or cause ?

Is there a link with LPC latency ? I read something about people having trouble watching videos on windows, which “seemed” to be caused by LPC latency.

Is this a known problem that can’t be fixed ?

5) Attachments :

  • reduction_mod.png : A simple schematic showing the differences between my custom ‘reduction’ program.

  • reduction.cpp : the original ‘reduction’ program by nvidia, freshly taken from the sdk.

  • reduction_Modified.cpp : my modification of the NVIDIA programs which puts in light the issue I have with the GTX590. I put ‘MODIFICATION’ labels accros the file for simplifying comparison with the original file.

Thanks in advance to anyone who would spare some time to read this and/or answer .
reduction_mod.png
reduction.cpp (16.9 KB)
reduction_Modified.cpp (18 KB)

I haven’t looked at your source code yet, but are you aware that CUDA will batch kernel launches?

To time individual kernel launches you may need a cudaThreadSynchronize() in strategic places of the timing code.

Hi,

First of all, thank you for your quick answer.

I think I have misspoken when I said “every single call to the kernel” ; I should have said “every single call to the ‘reduction’ algorithm”.

I will try to be more clear :

  • I am measuring the time of the ‘reduction’ algorithm in its whole a thousand times, then I display the longest and the shortest execution time.

  • NVIDIA algorithm measures the average time taken by a hundred call to the ‘reduction’ algorithm.

I hope this clarifies.

Now after looking at the code I find some cutilDeviceSynchronize() calls in the beginning and end of the reduction subroutine.

You could try removing the memory transfers from and to GPU to see if it’s the PCI-E memory transfers that are causing the latencies or not.

Furthermore, consider running the benchmark on a non-display GPU.

I haven’t used cuda Events yet, so I cannot really comment on whether you are using them correctly to time the algorithm as a whole.

Christian

Thank you for your advice, I indeed tested “empty” programs (no parameters transfered, no constant copies, no mem cpy …) and it gives me the same anomalies (aka some exectutions are ~4x longer).

For the non-display GPU testing, I must admit I didn’t run my programs on a dedicated graphic card (the same card handles the computing and the displaying), however I mainly work through ssh or vnc so there isn’t any real “active” display on my testing computer. On the other hand, I use the GTX460 the same way and it doesn’t show the problem.

Nevertheless, I will give it a shot, I can’t be sure until I tried.

The core of my issue is that all my tests work just fine on GTX460, but on GTX590 I observe random latencies. If anyone ever had the same issue please tell.

Hi,
I finally solved my problem, in fact the GTX590 wasn’t the issue. The thing is I performed my tests with two computers with the same components (but graphic cards) … and I saw that the MB bios were different.
The PC which is running tests with correct time values has an older bios, while the PC with the newer bios and the GTX590 is giving me bad time values.

After many tests involving many different bios flashing and cards swapping, I reached the conclusion that only the older bios (bios 501 for those interested) gives me the right results (with all graphic cards) through my tests.

Thanks to christian who spend some time on the issue and gave me leads. I should now head to ASUS forums.