CUDA concurrency problem - multi-GPU vector add

Arfeen · July 8, 2021, 12:38am

Hi,

I am trying to optimize a cuda application - multi-stream, multi-gpu - and it would be great if somebody could explain the profile. What troubles me is that its not starting on all GPUs at the same time while I expect it to. Here’s a quick summary of whats in the code:

cudaSetDevice(0);
cudaStreamCreate(&stream01);
cudaStreamCreate(&stream02);

cudaSetDevice(1);
cudaStreamCreate(&stream11);
cudaStreamCreate(&stream12);

cudaSetDevice(0);
cudaMemcpyAsync(... H2D, stream01); // assume transfer from host pinned mem to dev
kernel <<<grid, block, bytes, stream01>>>();
cudaMemcpyAsync(... D2H, stream01); // assume these

cudaMemcpyAsync(... H2D, stream02);
kernel <<<grid, block, bytes, stream02>>>();
cudaMemcpyAsync(... D2H, stream02);

cudaSetDevice(1);
cudaMemcpyAsync(... H2D, stream11);
kernel <<<grid, block, bytes, stream11>>>();
cudaMemcpyAsync(... D2H, stream11);

cudaMemcpyAsync(... H2D, stream12);
kernel <<<grid, block, bytes, stream12>>>();
cudaMemcpyAsync(... D2H, stream12);

Assume kernel just adds 8 vectors into one result vector in an obvious way. I want all GPUs to start simlutaneously which they do not do for “many streams per device”. Check out bellow profiles for streams = 4, 16 and 64 resp.

P.S. I’ll give more info if anybody wants it. I have summarized it for convenience sake. All comments appreciated. Thanks in advance!

Topic		Replies	Views
My streams are not running concurrently CUDA Programming and Performance	7	1777	March 6, 2018
How to Launch Cuda kernel in different processes CUDA Programming and Performance	8	3726	November 6, 2018
CUDA Streams: Start at the same time CUDA Programming and Performance	3	641	November 12, 2021
Cannot force kernels to concurrent execution CUDA Programming and Performance	8	5548	April 28, 2012
Problematic multi GPU execution CUDA Programming and Performance	6	1986	June 12, 2012
Streams and multiprocessor usage? CUDA Programming and Performance	3	2898	September 20, 2008
Kernel launch concurrency CUDA Programming and Performance	10	1803	December 11, 2014
Why streams cant run concurrently CUDA Programming and Performance	4	921	March 22, 2018
How to effectively parallelize cuda kernel launches on CPU CUDA Programming and Performance	9	3067	January 19, 2018
Kernels launched by multiple host threads get serialized by cudaStreamSynchronize(0) when --default- CUDA Programming and Performance	7	2868	October 12, 2021

CUDA concurrency problem - multi-GPU vector add

Related topics