Hi, I understand (almost) everything about CUDA streams and “–default-stream per-thread” compilation option and use it regularly in my application.
But sometimes I would like to have “legacy” behavior and build without this option. Since I’m using “cudaStreamPerThread” CUDA variable, I don’t get correct results.
I wasn’t able to find the documentation where this behavior is explained. Is it undefined behaviour? Should I not use “cudaStreamPerThread” in legacy build?