Launching kernels concurrently using std::thread

Hi,
Trying to launch a set of kernels concurrently using C++ std::thread. It doesn’t seem to work as I expected: The kernels are created on different streams but they do not seem to run concurrently. And yes, I do compile with default-stream per-thread. Anyone who has the same experience?

kernels don’t automatically run concurrently just because you launched them on separate streams (or threads). Take a look at the CUDA concurrentKernels sample code.

Hi, thank you for your answer. According to Mark Harris: https://developer.nvidia.com/blog/gpu-pro-tip-cuda-7-streams-simplify-concurrency/

I would expect the kernels to run concurrently when launched from separate streams. Maybe I misunderstood something in his example?

are you running his exact code?