Hi,
Anyone has experience with the --default-stream per-thread flag in a production environment?
Would it work in a production environment, under stress with many threads (and therefore streams) openning and closing all the time, 24x7?
Also, what would happen with a cudaMemcpy when running under this configuration? would a non-pinned memcpy running on a stream (created due to the --default-stream per-thread flag), would synchronize everything or just the currently created one?
thanks
Eyal