Slow performance on TX1 with Ubuntu UI disabled

Hi All,

Sorry if this is a basic question, I’ve tried googling around and searching this forum but can’t really find what I’m looking for.

I’ve got a TX1 plugged into a monitor; running the nbody as a benchmark yields the following:

Compute 5.3 CUDA device: [GM20B]
number of bodies = 32768
32768 bodies, total time for 10 iterations: 7966.188 ms
= 10.671 billion interactions per second
= 213.411 single-precision GFLOP/s at 20 flops per interaction

However, if I ssh into the device I see the following:

Compute 5.3 CUDA device: [GM20B]
number of bodies = 32768
32768 bodies, total time for 10 iterations: 7966.188 ms
= 1.348 billion interactions per second
= 26.957 single-precision GFLOP/s at 20 flops per interaction

I see the same lower performance if, while using the TX1 directly, I disable lightdm with

sudo lightdm stop

So the issue seem to be related to the user interface being disabled. I recall that there were some performance things with the TK1 that had to be adjusted manually when no GUI was present, a la, but I don’t know if that applies to the TX1. I also remember that running the benchmark a few times on the TK1 could “warm it up” after which the performance would be as it was with the UI enabled, but that doesn’t seem to be the case.

Again, I’m sorry if this is covered in some literature, but are there any performance tweaks I should make when using the device without a UI / remoting in? My intent is to runs some benchmarks and I thought things would be faster if there was no GUI to display.

Updated the title of the post; I initially thought the issue was related to SSH, but I see poor performance with lightdm disabled. Sorry for any confusion I caused.

I would normally work via ssh to Jetson (right now my desktop is dead, I’m working directly on Jetson). Remote display could possibly be having side-effects not intended (such as if using ssh -Y or -X). Remote display uses your desktop host to run the graphics for something originating on the Jetson. Unfortunately, I think there are times when GPU operations are mistaken for video to run remote display on. Under some circumstances I believe that CUDA may unintentionally end up migrating to your desktop host under the concept that GPU must be graphics. I have not tested this in a long time, so I do not know if this is still the case, but if it is, then lack of CUDA (or lack of a compatible CUDA version) on your host might end up with hardware acceleration being removed; or the inverse, if your host CUDA is compatible, I can see the possibility of the host CUDA taking over (without you knowing it) and having very high performance.

What is the exact ssh command you used? Can you verify if your desktop has CUDA as well, and if so, the version?

Interesting. So you’re saying that the remote display handler may be treating CUDA calls as graphics operations, and attempting to run them on the system I’m logging in from?

I’m logging into the TX1 from my windows computer with Cygwin:

ssh john@

no extra flags or anything. I’m running a benchmark right now but I’ll try adding -y to see if it gives any performance benefit.

My windows computer does have a 980 Ti installed and is CUDA capable, but I also tried from a Macbook air (with Intel integrated graphics) and got the same lower performance result. I will try adding -y on that system as well.

I actually don’t think this is ssh related. After turning off Ubuntu’s UI with

sudo lightdm stop

directly on the Jetson and running the benchmark, I still get lower performance than I did running it with the GUI visible. To clarify, I wasn’t logging in over the network, but disabling the UI gave me lower performance.

I just wanted to be sure that remote display was not changing where the GPU hardware was activated from. The login via “ssh john@” says that this is indeed not a remote display issue. Having lightdm involved in performance change based on whether lightdm is active or not is similar to enable or disable of remote display. Having environment variable “DISPLAY” empty or pointed somewhere else is in essence enable or disable of lightdm.

When running any benchmarks, do maximise CPU, EMC and GPU clocks.

The follwing instructions are for TK1 but I think they apply to TX1 as well, just the available frequencies might be different:

Thanks, I’ll try that. I was hesitant to at first, because as you say that guide is for the TK1, and I was hoping someone could confirm that these things still apply.

I’ll post again with results tonight.