Why would multiple GPUs not run in parallel?

mhdmac2017 · August 8, 2018, 8:29am

I’ve been given a complex piece of CUDA code to investigate. It is designed to run in parallel on multiple GPUs but it apparently does not. NVVP shows the GPUs execute in serial. I have made a few amendments to the code to enable asynchronous data transfer with concurrent kernel execution. When I use nvvp to profile the code, those parts of the code I amended to enable this show the GPUs run in parallel but in the rest of the code the GPUs which I have not amended the GPUs still execute in serial.

How can this be?

Robert_Crovella · August 8, 2018, 2:22pm

If your code is using cudaMemcpy or cudaDeviceSynchronize, those calls will block the host thread. Therefore a sequence like this:

...
cudaSetDevice(0);
cudaMemcpy(...);
kernel0<<<...>>>(...);
cudaMemcpy(...);
cudaSetDevice(1);
cudaMemcpy(...);
kernel1<<<...>>>(...);
cudaMemcpy(...);
...

will not allow kernel0 and kernel1 to execute concurrently, even though they are launched onto 2 separate devices.

If refactoring code of that type to use cudaMemcpyAsync, for example, is what you mean by “amended” then it seems you are already aware of this concept, and your question is puzzling. If you don’t fix that sort of issue, the code will not run concurrently on separate devices.

Topic		Replies	Views
Using Multiple GPU Turns Out Running Serially CUDA Programming and Performance	3	677	March 6, 2017
Problematic multi GPU execution CUDA Programming and Performance	6	2136	June 12, 2012
Parallel execution of GPU and CPU functions using streams CUDA Programming and Performance	7	49549	January 20, 2011
8x GPU app profiles parallel GPU kernel exec in NVVP, but kernels exec serial from cmd line CUDA Programming and Performance	5	639	September 15, 2020
CUDA and NPP Misc Issues CUDA Programming and Performance	6	1581	March 28, 2011
multi-GPU in cuda 4 CUDA Programming and Performance	4	1083	September 9, 2011
Kernels launched by multiple host threads get serialized by cudaStreamSynchronize(0) when --default- CUDA Programming and Performance	6	3093	June 13, 2021
Cannot force kernels to concurrent execution CUDA Programming and Performance	8	5707	April 28, 2012
Does Cuda memcpy locks device ? CUDA Programming and Performance	3	1655	June 16, 2011
Heterogenour programming CUDA Programming and Performance	4	1942	November 24, 2008

Why would multiple GPUs not run in parallel?

Related topics