Gnome and bandwidthTest Underperforming Desktop Environment


I am still pretty new to CUDA, trying to figure out what kind of application this can be used for. What I would like to do is realtime audio/synthesis under MIDI-control, so I started out studying what kind of timeframe would be possible? Ideally kernels should run for (at most) a millisecond at a time to have reasonably low latencies and leave enough room to have graphics updating as well.

At first this didn’t look too good. With a plain vanilla desktop kernel, about 20% of the time was spent on uploading a small fragment of data representative of the keyboard state. Upgrading the kernel to doubled the throughput but most importantly: Jumping out of Gnome then quadrupled the throughput? Modifying the bandwithTest to ask for high priority realtime performance shaved off yet another 20%, so now the test can upload some 10Kbyte of data in just 0.02 milliseconds rather than the original 0.2 milliseconds. That should be good enough, I think …

But what I wonder is: What is up with Gnome? X by itself isn’t degrading performance, nor is a lightweight environment like IceWM.

mvh // Jens M Andreasen
[using a silent 8400GS/g98]

Very interesting post, thanks for sharing.

Regarding the slowdown -
Could it be that your Gnome is using OpenGL to draw windows? This is on by default in some distros nowadays. The setting is in the Appearance control panel IIRC.

No. I don’t think so. I just tried out a few more window managers, and surprisingly my all time favourite WindowMaker shows the same symptom as Gnome, whereas blackbox/fluxbox are as invisible as working directly on the console.

Xfce has its own way of (not) working: The first run is high performing but on the second - or no later than the third run - performance has degraded by 90% back to 0.2 milliseconds/minimal transfer

It sounds like the commonality here is the windows manager with compositing support are slower, while those without compositing support are faster. This is basically what kristleifur suggested was the problem.

Does the slow down persist if you explicitly disable the Composite extension in your X configuration ?

You mean:

Section "Extensions"

    Option         "Composite" "Disable"


No that is the same. Restarting with the vesa driver kind of does the trick but then leaves the screen in a sorry state when the cuda program exits.

Do the window managers show the same difference with a non-RT kernel?

If the RT kernel benefits your machine, the WM’s may be calling some OS functions that are not RT friendly … ?

Kristleifur, it may not be the WM alone that is playing …

Opening up with only FluxBox and an xterm, the machine will let me have the front seat, and pretty much run it like as if I owned it. Starting OpenOffice.Draw and letting that one quietly hang around is also fine. Hoovering with the mouse in its drawing area - this updates two positional indicators, left and top - immediately brings back the 10x penalty on host-to-device I experienced in Gnome as well as WindowMaker.

The gotcha: All the little nifty toys one might have showing time, CPU, network usage and whatnot.

While I was at it, I came to think of that FlightGear has a builtin benchmark showing frames rendered per second. Wouldn’t it be nice to know - not only how much we are being disturbed, but also how much we are disturbing others? With all settings as default first-time-user I had a healthy 50fps. Running the CUDA bandwidthTest in parallel dropped the fps down 20% to 40, and left me again with 5 minimal htod per millisecond, indicative of 20% GPU. I think this is mostly good - that all clock cycles are more/less accounted for and nothing gets swallowed in some unknown void.


Edit: No wait, I am confused and that conclusion is wrong. The 20% performance drop in the competing application relates to 5 htod’s of which only one would be needed in a real application. So perhaps an 8% penalty altogether for getting data both in and out, once per millisecond?


# The command in question. From the console, running as root with high priority:

./bandwidthTest --memory=pinned --htod --mode=range --start=1024 --end=10240 --increment=1024 

Range Mode

Host to Device Bandwidth for Pinned memory


Transfer Size (Bytes)   Bandwidth(MB/s)

     1024               54.6

     2048               111.6

     3072               161.9

     4096               213.5

     5120               253.0

     6144               295.9

     7168               322.4

     8192               383.0

     9216               424.6

    10240               454.2

# The same from Gnome, with lots of little applets running as well:

 Transfer Size (Bytes)   Bandwidth(MB/s)

     1024               4.6

     2048               9.2

     3072               13.8

     4096               18.4

     5120               22.8

     6144               27.4

     7168               32.2

     8192               36.6

     9216               41.1

    10240               45.4

If anybody here has (the slightly better) 8500 or 9400GT, could they perhaps post some numbers for comparison?