Cuda Streams and multiple processes

Cudip · May 1, 2020, 5:38pm

My question is related to streams and multiple processes using the same GPU for execution.

Let’s assume CUDA MPS service is turned on, so the context is being shared between 2 processes P1 and P2. To maximize the overlap of data transfer and compute, as well as completely avoid doing cudaDeviceSynchronize() we start using streams in our code.

Q) How many streams are supported on Maxwell, Pascal and Turing architecture? I am guessing this is linked to how many kernels can be concurrently executed on the GPU. If yes, then what are those numbers for the mentioned architectures?

Q) I have seen code samples using these two ways of using streams:

kernel<<< blocks, threads, bytes, 0 >>>(); // stream 0
kernel<<< blocks, threads, bytes, stream1 >>>(); // where stream1 is object of cudaStream_t
Which one is the recommended way? Are they both the same? If two processes use the same stream number 0. Does that mean that the two process will start queuing instructions onto the same stream?

sivakumaranandan · May 3, 2020, 2:13pm

Hi,

>>How many streams are supported on Maxwell, Pascal and Turing architecture?
[SKA] No.of streams limitation should be function of GPU resource utilized by each stream invoked. Streams provide the programmer an opportunity to break down the data transfers into chunks and overlap w/ GPU compute execution (D2H and Kernel time). Thereby the overall time reduces.
Also, Streams allows to launch multiple kernels. Thereby the utilization of GPU resource can be maximized.

>> I have seen code samples using these two ways of using streams
kernel<<< blocks, threads, bytes, 0 >>>(); // stream 0
[SKA] NULL pointer as stream_id refers to the default stream. Any other int as stream_id should throw an error. Only cudaStream_t objects are acceptable.

error: argument of type “int” is incompatible with parameter of type “CUstream_st *”

kernel<<< blocks, threads, bytes, stream1 >>>(); // where stream1 is object of cudaStream_t
[SKA] All concurrent streams should be instantiated using cudaStreamCreate.
cudaStream_t objects enables the management of streams instantiated.
You can add stream call backs to know its completion, it facilitates a finer barrier synchronization.

Related Blog: https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/

Topic		Replies	Views
confusions about CUDA streams CUDA Programming and Performance	5	819	July 30, 2017
streams in Multi-gpu system CUDA Programming and Performance	7	6153	May 23, 2017
My streams are not running concurrently CUDA Programming and Performance	7	1812	March 6, 2018
How to Launch Cuda kernel in different processes CUDA Programming and Performance	8	3785	November 6, 2018
Running CUDA kernels from two different pthreads CUDA Programming and Performance	7	2913	May 10, 2016
multi task parallelization with cuda streams ? CUDA Programming and Performance	7	1487	September 14, 2017
Parallel execution of GPU and CPU functions using streams CUDA Programming and Performance	7	49421	January 20, 2011
Using CUDA to run many instances CUDA Programming and Performance	10	3467	April 1, 2012
Streams and multiprocessor usage? CUDA Programming and Performance	3	2899	September 20, 2008
Using streams... Howto? CUDA Programming and Performance	0	1114	July 25, 2008

Cuda Streams and multiple processes

Related topics