How does "cudaStreamPerThread" variable behave without "--default-stream per-thread" compilation option?

Hi, I understand (almost) everything about CUDA streams and “–default-stream per-thread” compilation option and use it regularly in my application.

But sometimes I would like to have “legacy” behavior and build without this option. Since I’m using “cudaStreamPerThread” CUDA variable, I don’t get correct results.

I wasn’t able to find the documentation where this behavior is explained. Is it undefined behaviour? Should I not use “cudaStreamPerThread” in legacy build?

cudaStreamPerThread will always be a different stream from the legacy stream, no matter if –default-stream per-thread is set or not