Performance Issues on headless server

leodemarco · November 8, 2012, 7:11pm

Hello. I am having trouble running Cuda code with very poor performance on a headless server.

When I was first learning Cuda I did some basic examples on my personal laptop which has a single GeForce GT 540M and I got, good, or at least reasonable results.

But when I moved those same examples to the server it takes an awful amount of time to execute even though the server has 3 Teslas C2075 and the examples are really basic and a lot of them use a single GPU.

For example, one of the programs simply compares how much faster is to copy from/to pinned memory, by transfering 2 equal arrays, one pinned and the other one not and measuring the time with events.

In the laptop it takes, aprox. 0.160s to run while in the server takes between 4 or 5 seconds. I strongly believe that what’s taking a lot of time on the server is context creation/destruction.
I reached this conclussion after explicitly creating the context and putting one print right before
the context creation function call and one right after.
Time elapsed between those 2 prints is highly noticeable on the server, while on the laptop it’s imperceptible.

The server has Ubuntu 10.10 Server and the laptop Ubuntu 11.10. They both have Cuda 4.2.9
I’ve done the followings things on the server after some readings on the internet but the problem is still there:
1- Disable Nouveau by adding to /etc/modprobe.d/blacklist.conf the following:
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
2- Added the following script to /etc/init.d
#!/bin/bash

/sbin/modprobe nvidia

if [ “$?” -eq 0 ]; then

Count the number of NVIDIA controllers found.

N=expr $N3D + $NVGA - 1
for i in seq 0 $N; do
mknod -m 666 /dev/nvidia$i c 195 $i;
done

mknod -m 666 /dev/nvidiactl c 195 255

else
exit 1
fi

3- Run sudo nvidia-smi -pm 1 each time I boot the server.

But, as I said, I keep getting the same poor performance. Any ideas?

njuffa · November 8, 2012, 7:52pm

This could be related to the fact that there are multiple GPUs with large memory in the system. I would suggest filing a bug with a self-contained repro app, and the precise specifications of the system on which you observe this, through the registered developer website. Thanks.

Gert-Jan · November 9, 2012, 8:17am

On a headless the machine the NVIDIA drivers are (usually) not loaded by default, but only after your first call to a CUDA function in your program (drivers are unloaded again when your program finishes). To “improve” your timing measurements, you can call for example cudaMalloc(&not_used, 0) first, or switch the GPU to persistence-mode using the nvidia-smi tool (this keeps the driver loaded, if I remember correctly).

leodemarco · November 9, 2012, 1:05pm

Gert-jan, everytime I start the server I run sudo nvidia-smi -pm 1 which enables persistance mode if I’m not wrong. That doesn’t fix it.

Gert-Jan · November 12, 2012, 8:15am

sudo nvidia-smi -pm 1 should do the trick indeed.

If it did not help, then most likely njuffa is right and you have found a bug in the CUDA-ecosystem. He is (as I recall) an NVIDIA employee, so he knows best anyway ;)

leodemarco · November 12, 2012, 1:02pm

Thanks both for your replies. I’ve already filled the bug form as njuffa suggested.

Topic		Replies	Views
Buying Nvidia Products is a Serious Waste of Money: They Don't Work CUDA Developer Tools	0	436	June 26, 2020
Running CUDA programs without starting X server CUDA Programming and Performance	8	8703	December 8, 2020
Performance first execution First execution very very very slow, next execution OK CUDA Programming and Performance	3	2967	October 17, 2009
performance problem CUDA Programming and Performance	2	606	July 16, 2018
Cuda working before then no CUDA-capable device detected CUDA Setup and Installation	4	2882	April 27, 2016
Windows Server 2012 R2 - Cuda 6.5.19 and Cuda 7.0.28 - Erroor Code 38 CUDA Setup and Installation	12	2971	June 9, 2015
Inexpiable CUDA hang (NOT WDM timeout!) CUDA Programming and Performance	2	1473	June 5, 2014
problem with double precision unpredictable results Different run give differents errors or no error CUDA Programming and Performance	12	2785	September 10, 2010
Slow CUDA programs' startup CUDA Programming and Performance	10	7246	January 23, 2012
Running CUDA in a service Example of a CUDA service in Vista, Server 2008 and Windows 7 CUDA Programming and Performance	27	26602	July 19, 2011

Performance Issues on headless server

Count the number of NVIDIA controllers found.

Related topics