Should legacy default stream behave serially under multiple host processes/contexts?

1055057679 · October 18, 2022, 8:36am

According to the document, there is one pariticular NULL stream for legacy default stream in one GPU device. If I use legacy default stream in multiple host processes, should that cause my kernel executions to be serialized? Since they should share the same default stream?
But in my experiments, these default streams can sometimes execute concurrently, like the picture:

As far as I know , pytorch uses legacy default stream rather than per-thread default stream, and I did the experiment with pytorch. It confuses me.
PS: I have turned on MPS

Robert_Crovella · October 18, 2022, 2:03pm

multiple processes ordinarily serialize kernel launches between processes. this is independent of which streams each process is using. Turning on MPS, however, allows at least the possibility that kernels from independent process can overlap. Again, this possibility is independent of which streams each process is using.

system · November 1, 2022, 2:03pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CUDA streams, default stream zero CUDA Programming and Performance	2	1174	September 10, 2013
"--default-stream per-thread" on multi-GPU environment not working as expected? CUDA Programming and Performance	1	230	September 19, 2023
--default-stream per-thread question CUDA Programming and Performance	2	750	August 22, 2018
Do kernels/streams execute concurrently? CUDA Programming and Performance	1	1174	October 15, 2008
Multiple simultaneous kernels across different streams CUDA Programming and Performance	3	4535	February 3, 2009
Distinct Kernels on Concurrent Streams? CUDA Programming and Performance	3	1208	June 9, 2009
Dynamic parallelism and streams CUDA Programming and Performance cuda , kernel	7	727	June 5, 2023
Question about streams CUDA Programming and Performance	1	979	August 6, 2009
Executing kernel from different host threads CUDA Programming and Performance	1	1780	September 1, 2011
Kernels launched by multiple host threads get serialized by cudaStreamSynchronize(0) when --default- CUDA Programming and Performance	7	2797	October 12, 2021

Should legacy default stream behave serially under multiple host processes/contexts?

Related topics