Running CUDA programs without starting X server

apophys · June 11, 2010, 10:50am

Hello, I have a Ubuntu 9.10 machine with a GTX 480 card. Since I have only one video card on the machine, when I run a kernel that takes longer than about 10 seconds, the watchdog seems to kill it and I get launch timeout.

I have booted ubuntu into text mode, so there is no X server and therefore no watchdog. The problem is, even though the nvidia driver seems to be loaded(lsmod | grep nvidia), the CUDA programs do not work: they cannot find any CUDA capable device.

Do I need to load an additional driver or something?

Thanks!

Nico · June 11, 2010, 11:26am

This is explained in the cuda toolkit release notes.

N.

apophys · June 15, 2010, 9:00am

Thank you, that was it. I should have RTFM-ed more :">

fadeyda · September 18, 2013, 2:07pm

That is fine but every run of cuda code tooks about 5 seconds! Something is missing here! X-es loads something… but it is not a module!
I’ve tried lsmod > modules_1.log during idle and lsmod > modules_2.log and diff modules_1.log modules_2.log gave me only:
diff modules_1.log modules_2.log
14c14
< nvidia 11201625 0

nvidia 11201625 56
What could be missing? It is some initialization of device i suppose. May be i need some permanently running “cuda kick starter”. i mean some code running quite frequently that doing nothing but lets device to be active…
(I do not mean performance level - it could be minimal)

phillipd · September 18, 2013, 3:38pm

Under Debian I just press ctrl-alt-F1 to go to the shell. There, I just launch my cuda program without being killed by the watchdog after a few seconds. When the program finished, I go back to my X desktop by pressing ctrl-alt-F7.

pasoleatis · September 19, 2013, 1:06am

The Watchdog kills an individual kernel when it takes more than 5 seconds. I am running cuda programs for days without having them killed. A cufft called for example has quite many kernel calls so it little chances to get killed even for very large matrices.

Regarding the original questions. At my workplace we have 2 computers without running X server which are used for CUDA.

fadeyda · September 19, 2013, 9:05am

no! i mean the hangup before running the kernel. It takes 5-7 seconds to run my program or nvprof or nvidia-smi (any device related program). After that (inside my program) kernels run normally: before running each kernel there is no hang up.
Moreover during the runtime of my program nvidia-smi also runs smoothly. So it is some initialization happens before running the kernel.

I will be very appreciated if you can advice something… I’ve tried lsmod during runtime of device related programs but nothing except nvidia module changed… it was used by 0 before run and by 56 during runtime of my program.

fadeyda · September 19, 2013, 10:04am

I’d like to add some new results:
If I use script which sets cuda nodes (the same as above provided by Nico) and after that start X then i have no hang up for nvidia-smi but 6 sec hang up for cudaSetDevice(). I have 4 physical 690 cards → 8 logical in my system.

gavin.keith.ridley · December 8, 2020, 10:40pm

For anyone coming back to this (I just did), it seems that a convenience script is now included in CUDA (or the driver, probably) distributions to handle the creation of device files and checking that the module is loaded. Check “man nvidia-modprobe”.

Topic		Replies	Views
CUDA on a remote linux box CUDA Programming and Performance	7	12092	November 19, 2007
Driver "startup" time CUDA Programming and Performance	3	1408	March 4, 2011
Time limit on Linux ? Is there one ? CUDA Programming and Performance	8	22905	September 9, 2007
problems with cuda on linux CUDA Programming and Performance	13	22204	May 16, 2007
"time out" in cuda program mechanism of "time out" CUDA Programming and Performance	14	12721	December 9, 2008
Configuring the watchdog timer under Linux/X CUDA Programming and Performance	4	2623	January 7, 2011
RHEL startup script for CUDA CUDA Programming and Performance	0	8860	November 28, 2007
How to execute kernel for more than 5 - 6 seconds ? Disabling watchdog timer ? CUDA Programming and Performance	5	10371	June 18, 2010
intializing the nvidia device to run cuda without X CUDA Programming and Performance	1	1951	February 3, 2010
5s run time limit on Linux 5s runtime limit on Linux CUDA Programming and Performance	5	5168	August 4, 2008

Running CUDA programs without starting X server

Related topics