Tesla M40 Multi-GPU Performance Issue

nungnung · February 6, 2017, 5:08pm

Hi,

I have a a program that runs identical CUDA code (on different sets of data
of the same size) on 2 GPUs for multiple iterations.

In each iteration:

Copy data array 0 from CPU to GPU 0 and data array 1 to GPU 1
Use cuFFT to compute convolution of the input data and a filter (the filter is previously computed and stored on each GPU)
Copy convolution results back to CPU

Problem:
When running using 2 Tesla M40s, the performance is good at the
beginning, but after ~20 iterations, GPU 1 starts slowing down and get upto
4.5x worse, while GPU 0 remains to perform well consistently throughout all
the iterations.

Performance problem details
After ~20 iterations all operations (involving GPU 1) are slow, including:

Data movement between host and device 1 (via cudaMemcpy())
3D FFT and IFFT using cuFFT
Point-wise multiplication

I tried running the same program on 2 Tesla K80s. The performances are
consistent across both GPUs and throughout all the iterations on this node.

Has anyone had the same problem? Any insights or suggestions on how to investigate this further would be
highly appreciated.

Robert_Crovella · February 6, 2017, 5:27pm

During the slowdown, take a look at the output of nvidia-smi -a in that case/for that time. Take a look at the GPU temperature and also whether any clock slowdown reasons are listed - also compare clock speeds between that case and the “normal” case.

Topic		Replies	Views
Behaviour in running two programs on single GPU(Tesla K40m)? CUDA Programming and Performance	2	708	December 5, 2014
tesla m40 runs extremely slowly Frameworks tensorflow	4	3343	October 12, 2021
CUDA code Performance on different GPUs. CUDA Programming and Performance	1	650	May 6, 2013
Speed problems with multi-gpu on GTX295 CUDA Programming and Performance	6	3080	January 5, 2010
Unstable/Unreliable GPU Device (Tesla C1060) CUDA Programming and Performance	3	3265	May 13, 2010
problem with multi GPU application CUDA Programming and Performance	2	4286	March 4, 2009
Problem with multiple GPUs The multiple GPUs are not working in parallel CUDA Programming and Performance	6	1871	September 2, 2010
Weird multiGPU performance About 10 times slower than single GPU CUDA Programming and Performance	10	3916	November 25, 2009
Using Multiple GPU Turns Out Running Serially CUDA Programming and Performance	3	584	March 6, 2017
cudaMemcpy2D slow with TESLA1060 ? CUDA Programming and Performance	3	2765	November 6, 2009

Tesla M40 Multi-GPU Performance Issue

Related topics