Using thrust::cuda::par with thrust::cuda::par.on

Greetings,
I have been tasked to make a very old project heavily using thrust as non-blocking as possible, so I am throwing stream definitions left and right, however, at some point saw this with its own execution policy restricting to use a memory region.

thrust::transform_inclusive_scan( thrust::cuda::par(Allocator), input.begin(), input.end(), output.begin(), scanStencil(), thrust::plus<int>());

Is there a way to combine thrust::cuda::par.on(myFooStream) with thrust::cuda::par(Allocator) in a simple manner without writing my own execution policy backend?

Best,
(This is a duplicate of https://devtalk.nvidia.com/default/topic/1061320/gpu-accelerated-libraries/using-thrust-cuda-par-with-thrust-cuda-par-on/)

Does this work?

thrust::cuda::par(allocator).on(stream)

Note, however, that in the latest thrust version thrust calls are still blocking with respect to host even when streams are used. To have non-blocking thrust calls on the host, you need to use the new asynchronous API. https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#thrust-release-notes

Greetings striker159,
It works perfectly, thank you very much, also I do appreciate your warning regarding the API.
Best,

Hi,
I’ve found this --default-stream per-thread to be very very helpful feature from NVIDIA, exactly for the problem you’re describing.

thanks
Eyal