Streams not running conccurently

Is there any special reason why you’re calling FunctionB() with the default stream? As far as I understood it, workloads being processed in the default (0) stream can’t run concurrently with workloads in other streams.

Meaning: Whenever a workload within the default stream is executed, any other workloads will be stalled until the work in the default stream has finished. So, the way I see it, you’ll have to call FunctionB() with a non-default stream, too, if you want to achieve “reliable” concurrency.

In addition, kernels launched within the same stream (e. g. Kernel2 and Kernel3 in FunctionB()) will always run sequentially. But this should be obvious as it’s true for non-default streams as well as for the default stream.