CUDA Performance problems depending on a system

stvad · June 28, 2016, 3:25pm

Hi, I have 2 machines that are exactly the same hardware-wise.
On them, I have 1 Debian, and 1 custom build with buildroot linux image.

The problem I’m experiencing is that:
For my application, I’m getting roughly 2x performance on the Debian machine in the relation to custom build image machine.
I’m trying to understand what could cause this difference.
The cuda version is 7.5
The driver versions are:
Debian:361.28
Custom: 367.27

Some information that may be relevant:
If I run deviceQuery (from cuda samples) on both machines, the results are almost but not exactly the same:
1st difference is in line:
This is for Debian:
Total amount of global memory: 3069 MBytes
And this is for Custom:
Total amount of global memory: 3008 MBytes
(Though I doubt that this difference can cause the mentioned difference in performance.)
The other difference is:
Debian:
Run time limit on kernels: Yes
Custom:
Run time limit on kernels: No
…

If I run the bandwidthTest from samples, the results are more/less the same for Host to Device Bandwidth, 1 Device(s) and Device to Host Bandwidth, 1 Device(s), but can differ significantly for
Device to Device Bandwidth, 1 Device(s)
So for the last entry values on Debian is in the area of 100k MB/s (though sometimes it drops down to around 63k)
For Custom it’s consistently on the level of 63k.

I would be glad if you could help me with advice on what should I investigate further and what could be the problem.
Thank you!

Robert_Crovella · June 28, 2016, 3:40pm

Based on this:

it seems like in one case the X-server is configured to use that GPU and in the other case it is not.

I don’t really know if that would explain anything, but it’s possible it could show a difference machine-to-machine in performance. However, I would expect the machine with X configured would be slower, not faster.

stvad · June 29, 2016, 8:22am

Yes, that would be my expectations also. I’ve tried configuring the Custom system to use nvidia, but encountered some problems: X.org can’t load GLX module, maybe it’s related…

stvad · June 29, 2016, 9:04am

No, it seems that not loading GLX was xorg configuration problem.
Xorg won’t start anyway but, for some other reason now.

stvad · June 29, 2016, 11:39am

Interesting fact - if I start my application when X is in this half-started state - performance drops even more (more than 4x decrease from the Debian in total)

alberthandsome · June 29, 2016, 12:19pm

[url][/url][url][/url][I just posted an article on the Premiere Pro team blog based on the information and questions in this forum thread.] DataStage Training

‘Mercury Playback Engine’ is a name for a large number of performance improvements in Premiere Pro CS5. Those improvements include the following: R Programming Training

64-bit application
multithreaded application
processing of some things using CUDA Android Training

Everyone who has Premiere Pro CS5 has the first two of these. Only the third one depends on having a specific graphics card. SharePoint Training

Confusingly—because of one of our own early videos that was just plain unclear—a lot of people think that ‘Mercury’ just refers to CUDA processing. This is wrong. To see that this was not the original intent, you need look no further than the project settings UI strings ‘Mercury Playback Engine GPU Acceleration’ and ‘Mercury Playback Engine Software Only’, which would make no sense if ‘Mercury’ meant “hardware” (i.e., CUDA). SQL Training

The official and up-to-date list of the cards that provide the CUDA processing features is here:

Some of the cards on that list are only enabled if you have the recent updates. SAS Training

On Mac OS, CUDA processing features of Premiere Pro CS5 require Mac OSX v10.6.3 or later.

CUDA is an Nvidia technology, so only Nvidia cards provide it.

If you don’t have one of these CUDA cards, you can still use Premiere Pro CS5; you just won’t get the advantages of processing with CUDA.

Here’s a list of things that Premiere Pro CS5 can process with CUDA:

some effects
scaling
deinterlacing
blending modes
color space conversions

It’s worth mentioning one set of things that Premiere Pro CS5 doesn’t process using CUDA: encoding and decoding.

Note that whether a frame can be processed by CUDA depends on the size of the frame and the amount of RAM on the graphics card (VRAM). This article gives details about that, toward the bottom.

Processing with CUDA doesn’t just mean that things are faster. In some cases, it can actually mean that results are better, as with scaling. See this article for details.

The term ‘Mercury Playback Engine’ refers to Premiere Pro. It has nothing to do with After Effects. After Effects CS5 is a 64-bit application, and it has been multithreaded for a long time, so those improvements are there. But After Effects doesn’t use CUDA (though a few third-party plug-ins do).

stvad · June 29, 2016, 3:30pm

I’ve just tried updating to CUDA 8 on a custom machine - hasn’t changed the described situation.

stvad · June 29, 2016, 4:00pm

http://cuda-z.sourceforge.net/
Shows 2x or more perf reduction on a Custom system for every type of operation it can measure.
(And also almost 2x on device to device memory copy)

Robert_Crovella · June 29, 2016, 4:29pm

CUDA is really only tested to work correctly on the configurations that are listed in the install guide. It’s not expected to work correctly on any possible custom configuration.

stvad · June 30, 2016, 7:41am

Well, It’s not like there is something magical about those systems. If one can be configured to work correctly with cuda, so can the other. The question is what is the key difference here, that is causing the problems.

stvad · June 30, 2016, 12:21pm

Ok, what finally helped was updating the way nvidia driver was installed (buildroot mk).
And running application after starting X server.
Though this remains the same:
Run time limit on kernels: No

Topic		Replies	Views
CUDA performance on Linux Sample programs shows it's slower? CUDA Programming and Performance	9	10130	June 5, 2007
Huge performance difference depending on the machine I put my card in CUDA Programming and Performance	17	7553	September 5, 2015
Problems with in consistent speed Is it drivers or x64 or ??? CUDA Programming and Performance	2	2430	September 5, 2008
GTX480 performance on different motherboards performance differs on AMD and INTEL motherboards CUDA Programming and Performance	15	18430	June 7, 2010
multiGPU poor performance up to 10x lowest performance in multiGPU CUDA Programming and Performance	14	10812	January 18, 2008
Performance difference between Tesla and system where Cuda GPU is used as display device CUDA Programming and Performance	8	5948	September 2, 2009
Memory bandwidth CUDA Programming and Performance	31	38553	October 5, 2007
Debian hangs after upgrading to deverloper drivers 260.24 CUDA Programming and Performance	9	13723	October 12, 2010
Speed difference for same CUDA code under Windows/Linux CUDA Programming and Performance	24	46047	March 17, 2010
Startup Script for Runlevel 3 Headless boot problems CUDA Programming and Performance	5	3850	April 22, 2008

CUDA Performance problems depending on a system

Related topics