cudaStreamSync and WDDM relation

nhp12345 · May 11, 2018, 7:06am

Hi everyone,

I’m learning about WDDM effect on kernel launches. However, only read its explanation here and there without actual proofs, like screenshots from Nsight Performance Analysis, is hard to make it get through my thick head. Therefore, I would like to ask for your favor to point it out directly in my attached screenshots below. Here are some pages which I found to be quite informative about WDDM affection:
https://stackoverflow.com/questions/12196044/time-between-kernel-launch-and-kernel-execution
https://devtalk.nvidia.com/default/topic/548639/is-wddm-causing-this-
https://devtalk.nvidia.com/default/topic/525137/?comment=3718330

My current problem with it is that I’m having some idle periods between kernel launches in my program (Nonsync1 and Nonsync2 images). The nonsync version program cycles in a loop of various kernel launches without any cudaStreamSync call. However, if I switch to a sync version, call streamSync once within each loop, those gaps disappear! (Sync1 and Sync2 screenshots) Here are my questions regarding this topic:

In a program which contains only kernel/memcpy/memset calls, is it always WDDM false if there occur repeated gaps in the execution timeline?
In my case, is this behavior also because of WDDM? If yes, can you point out which one in my screenshot? If it doesn't appear in my screenshots, could you please instruct me where to capture it footprint?
What is your guess/hypothesis about why adding the cudaStreamSync would eliminate those gaps as in my 2nd version program?

Thanks so much for your help!

Topic		Replies	Views
"idle time" between kernel calls ( from NVVP inspection) CUDA Programming and Performance	4	5162	December 10, 2012
Gap between some thread calls CUDA Programming and Performance	6	1260	October 30, 2014
Unexplained gaps in CUDA stream execution Profiling x86 Windows Targets	7	1307	March 29, 2023
Memset/memcpyDtoD implicitly synchronizes all streams -- a way to disable it? CUDA Programming and Performance	5	531	August 23, 2023
Synchronising between kernel launches Ensuring memory coherence during kernel launches in for-loop CUDA Programming and Performance	6	7018	May 31, 2011
Kernels launched by multiple host threads get serialized by cudaStreamSynchronize(0) when --default- CUDA Programming and Performance	7	2829	October 12, 2021
Problem regarding data transfer overlap between multiple asynchronous streams CUDA Programming and Performance	8	799	September 11, 2016
Concurrent Kernels CUDA Programming and Performance	2	3504	April 2, 2013
Newbie: async kernel, so I can do stuff on the CPU meanwhile, yeah? CUDA Programming and Performance	2	373	January 13, 2019
why is cudaMemsetAsync(), cudaMemcpyAsync(), or even cudaEventRecord() killing parallel kernel exec CUDA Programming and Performance	2	4666	April 4, 2013

cudaStreamSync and WDDM relation

Related topics