Slow CUDA programs' startup

madpro · November 15, 2010, 5:08pm

Hello All,

I’m experiencing a problem with running CUDA programs on linux system (x64 Tesla T10). Every program (even SDK samples) takes about 2-4 sec to run first CUDA command (init, sometimes memory allocation, etc.).
I guess CUDA runtime compiles PTX code for T10 architecture, but I’ve tried to include -arch and -code options to my nvcc command line and it didn’t help (googling an answer didn’t help either).

The problem gets annoying when I try to use 4 GPUs, because it takes about 12 sec to init all of them.
What’s more interesting: initializing one GPU slows down the others (memory allocation takes about 1,5 sec on each GPU and I assume it should be done in parallel, or shouldn’t?).

Thanks for any help/hints!

madpro · November 15, 2010, 5:08pm

Hello All,

I’m experiencing a problem with running CUDA programs on linux system (x64 Tesla T10). Every program (even SDK samples) takes about 2-4 sec to run first CUDA command (init, sometimes memory allocation, etc.).
I guess CUDA runtime compiles PTX code for T10 architecture, but I’ve tried to include -arch and -code options to my nvcc command line and it didn’t help (googling an answer didn’t help either).

The problem gets annoying when I try to use 4 GPUs, because it takes about 12 sec to init all of them.
What’s more interesting: initializing one GPU slows down the others (memory allocation takes about 1,5 sec on each GPU and I assume it should be done in parallel, or shouldn’t?).

Thanks for any help/hints!

pawel_astro · November 27, 2010, 6:10am

this elusive problem was discussed in one place only somewere, some time ago.

I have this problem only on gtx295 (3 card system). As superuser I have to do

nvidia-smi -l --interval=59 -f /var/log/nvidia-smi.log &

after every reboot to remove the annoying delay. I forgot if the file in that command line needs to exist

when done first time, create something that can be overwritten if so. 59 is my choice of # of seconds to re-run the smi utility (it’s part of the SDK or toolkit).

hope it works for you

pawel_astro · November 27, 2010, 6:10am

this elusive problem was discussed in one place only somewere, some time ago.

I have this problem only on gtx295 (3 card system). As superuser I have to do

nvidia-smi -l --interval=59 -f /var/log/nvidia-smi.log &

after every reboot to remove the annoying delay. I forgot if the file in that command line needs to exist

when done first time, create something that can be overwritten if so. 59 is my choice of # of seconds to re-run the smi utility (it’s part of the SDK or toolkit).

hope it works for you

seibert · November 27, 2010, 5:02pm

Thanks for the reminder about this trick! After updating the kernel on my Ubuntu 10.04 system last week, I also started seeing these very slow CUDA initialization times. Running deviceQuery required 4 seconds, but with nvidia-smi running in the background, it only takes 0.03 seconds.

seibert · November 27, 2010, 5:02pm

Thanks for the reminder about this trick! After updating the kernel on my Ubuntu 10.04 system last week, I also started seeing these very slow CUDA initialization times. Running deviceQuery required 4 seconds, but with nvidia-smi running in the background, it only takes 0.03 seconds.

madpro · December 5, 2010, 6:45pm

Yes it does work for me.

Thank you very much!

I had also a problem with initializaiton of SDK’s radixSort modules, but I’m replacing it with faster sort algorithm…

miccim · November 7, 2011, 7:28pm

I have the same problem with 2 C2050 cards. A source which runs on my Q600 in 500ms needs 5sec on the Tesla Card. I tried:

[code]

nvidia-smi -l --interval=59 -f /var/log/nvidia-smi.log &

[\code]

but the --interval seems no valid options (toolkit 4.0). Are there any updates to this method?

mfatica · November 7, 2011, 7:32pm

Use the new persistent mode:

nvidia-smi -pm 1

miccim · November 22, 2011, 10:54am

First i thought this worked, but i still have a delay of a few secondes at the beginning. Persistent Mode is enabled, the command shows that the cards are still in permament mode.

It is independent from the commands used, the first command in the source has this delay. Is there anything I can do, like deinitialize the cards at the end of my source or something like this? Any other ideas?

System consists of 2x Tesla C2050

EDIT: The problem appears evenif i launch the programm twice (the second immediately after the first).

rmontagne · January 23, 2012, 9:06am

I have the same problem with 2x Tesla C2075, driver version is 290.10 and CUDA 4.1 toolkit.

Setting persistant mode enhance deviceQuery test, but not any other test using CUDA kernel. I have a delay of 2 or 3 sec when I launch my kernel test.

Any ideas ?

Thank you

Romain

Topic		Replies	Views
Strange delay on CUDA initialization CUDA Programming and Performance	6	20590	November 30, 2011
64 bit Windows 10, gtx 1060, CUDA kernel startup time? CUDA Programming and Performance	12	2825	October 10, 2017
why 2.9 seconds to start tesla K20 CUDA Programming and Performance	12	1213	March 4, 2018
Speed problem on 295 gtx cards CUDA Programming and Performance	19	10483	January 8, 2010
CUDA initialization takes long time that varies up to 30 seconds on Amazon p3.16xlarge Windows machi... CUDA Programming and Performance	5	1383	December 8, 2019
First CUDA call takes 13 seconds CUDA Programming and Performance	6	4285	July 2, 2015
really slow cudaGetDeviceCount() several seconds to complete a cudaGetDeviceCount() call CUDA Programming and Performance	3	1186	May 18, 2011
Really slow nvidia-smi, cuda initialization or context creation (L40) CUDA Programming and Performance	6	135	August 8, 2024
problems with cuda on linux CUDA Programming and Performance	13	22203	May 16, 2007
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	63986	April 20, 2011

Slow CUDA programs' startup

Related Topics