--default-stream per-thread question

Hi,
Anyone has experience with the --default-stream per-thread flag in a production environment?
Would it work in a production environment, under stress with many threads (and therefore streams) openning and closing all the time, 24x7?

Also, what would happen with a cudaMemcpy when running under this configuration? would a non-pinned memcpy running on a stream (created due to the --default-stream per-thread flag), would synchronize everything or just the currently created one?

thanks
Eyal

Any idea??

thanks
Eyal

for non-pinned cudaMemcpy behavior, see here:

[url]https://devtalk.nvidia.com/default/topic/1038581/cuda-programming-and-performance/performances-of-multi-thread-vs-multi-process-with-mps/post/5276929/#5276929[/url]

it may affect other activity on the device, besides just the activity on the stream its on