Can I use Cooperative Groups inside Streams?

I’m quite new in CUDA development and currently use CUDA Streams to parallelize processing of some data preparation tasks. Then I need to run several reduction algorithms on these prepared data, and found Cooperative Groups really useful here. For example, the reductionMultiBlockCG CUDA Sample on my laptop works ~6 times faster than analogous Thrust code:
thrust::device_ptr d_ptr1(d_idata);
thrust::device_ptr d_ptr2(d_idata + size);
float sum = thrust::reduce(d_ptr1, d_ptr2);

But, code in this CUDA sample, uses interesting CUDA API to determine the occupancy like cudaOccupancyMaxActiveBlocksPerMultiprocessor() or cudaOccupancyMaxPotentialBlockSize() and I’m not sure is it a good idea to call these occupancy detection inside the Stream.

So, the general question is: can I use Cooperative Groups inside the Streams? And if I can, may I rely on the cudaOccupancyMaxPotentialBlockSize() API or better to use some “manual” occupancy values?

Just realized that cudaLaunchCooperativeKernel() has a cudaStream_t parameter :). So, should work.