Performance drop when using the processes using different gpus on one machine

awagner0815 · March 27, 2015, 10:05am

Hi,

I have two different workstations
The setup of the workstations looks like:
WS1
MSI x99 board (32GB ram)
i7 K 5820
2x GTX 970
NVidia driver 346.28
Ubuntu 14.10 (GNU/Linux 3.16.0-31-generic x86_64)
Cuda 7.0

WS2
MSI x99 board (32GB ram)
i7 K 5930
2x GTX 970
1x GTX 980
NVidia driver 346.47
Ubuntu 14.10 (GNU/Linux 3.16.0-31-generic x86_64)
Cuda 7.0

We don’t use the SLI connector (I mean the small bridge to connect cards).
I use cuda via caffe library. I run two processes using caffe on one machine but different cards.
I used nvidia-smi to see both processes runs on different gpus.
If I run one process then one iteration of my sample prgram takes nearly 5 minutes.
If I run both processes then the time per iteration is increasing up to 10 to 15 minutes and sometimes up to 50 minutes.
I made a profile via nvprof and the average, min and max times per functions are basically the same.

A sample the profile. I stopped the iteration after ~25 minutes because the sample program runs about 200 min and I don’t want to wait

one process             : 17.41%  289.804s     13320  21.757ms  6.8031ms  41.368ms  void cudnn::detail::convolve_dgrad_engine
two different processes : 17.42%  95.8076s      4413  21.710ms  6.7617ms  40.956ms  void cudnn::detail::convolve_dgrad_engine

It happens on both machines, using different driver versions. (Updated the WS1 to the latest cuda version and drivers and it makes not difference)

Any solutions for this problem?

Thank you!

little_jimmy · March 28, 2015, 6:51am

seems similar to:

[url]https://devtalk.nvidia.com/default/topic/818054/cuda-programming-and-performance/running-two-instances-of-matlab-calling-mex-dll-files-which-use-different-gpus-on-the-same-pc/[/url]

i have not really tried to run multiple processes on the same machine
however, in addition to the suggestions in the link, it seems as if the best way to have multiple (processes as) instances is via MPI/ IPC or by having the primary process multi-thread
this should serve as a way to better control the cuda context

Topic		Replies	Views
Running multiple instances of a sample code on a gpu CUDA Programming and Performance	0	611	February 23, 2011
Failure with independent devices on independent processes Try it yourself! CUDA Programming and Performance	19	3556	March 10, 2011
CudaMalloc fails when more of 2 linux process acces to the GPU 0 CUDA Programming and Performance	2	1170	February 24, 2009
multiGPU poor performance up to 10x lowest performance in multiGPU CUDA Programming and Performance	14	10851	January 18, 2008
About weird performance of multiple GPUs CUDA Programming and Performance	0	4308	January 5, 2009
Weird multiGPU performance About 10 times slower than single GPU CUDA Programming and Performance	10	3995	November 25, 2009
Running multiple nvprofs at the same time CUDA Programming and Performance	0	511	March 10, 2017
Multi GPU results in latencies in Linux CUDA Programming and Performance	4	1941	April 25, 2012
Gpu and multiple processes CUDA Programming and Performance	6	1770	September 16, 2010
simpleMultiGPU processing time slower on dual than single? CUDA Programming and Performance	4	2301	November 30, 2008

Performance drop when using the processes using different gpus on one machine

Related topics