Hello everyone,
I modified MonteCarlo application from sample codes by limiting its blocks and threads in order to have 3 concurrent executions. I used visual profiler to check if I did it properly and the figure below shows what I saw(please ignore the black and red marks made by me).
What might be the problem that makes the 1 kernel not to be concurrent with the other two while the fith time all three seem to run concurrently?