Dear all, I have met a problem when I am trying to use multiple different streams in different thread (in the same process).
When I test the processing time of cudaMemcpyAsync(), I found that sometimes the host is blocked with this functions. Why could this situation happens? And how can I avoid this situation?
Thanks a lot.