Using thrust::cuda::par with thrust::cuda::par.on

I have been tasked to make a very old project heavily using thrust as non-blocking as possible, so I am throwing stream definitions left and right, however, at some point saw this with its own execution policy restricting to use a memory region.

thrust::transform_inclusive_scan( thrust::cuda::par(Allocator), input.begin(), input.end(), output.begin(), scanStencil(), thrust::plus<int>());

Is there a way to combine thrust::cuda::par.on(myFooStream) with thrust::cuda::par(Allocator) in a simple manner without writing my own execution policy backend?

Note, however, that in the latest thrust version thrust calls are still blocking with respect to host even when streams are used. To have non-blocking thrust calls on the host, you need to use the new asynchronous API.

I’ve found this --default-stream per-thread to be very very helpful feature from NVIDIA, exactly for the problem you’re describing.