Visual Profiler and Streams concurrency

Hello everyone,

I modified MonteCarlo application from sample codes by limiting its blocks and threads in order to have 3 concurrent executions. I used visual profiler to check if I did it properly and the figure below shows what I saw(please ignore the black and red marks made by me).

What might be the problem that makes the 1 kernel not to be concurrent with the other two while the fith time all three seem to run concurrently?

maybe the synchronous memcpy operations running in the default stream interfere by forcing synchronization?

Maybe allocate one page locked buffer per stream and use async copies if this is feasible given the design of the code.


This is what I thought too. So I will check it! Thank you!