cudaMemcpyPeer across OpenMP threads I want to copy from one thread/gpu to another using cudaMemcpyP


Can the function “cudaMemcpyPeer” be called from within an OpenMP parallel region? Does it need to be placed within a #pragma omp single region?


Well, I got some code to compile, but the cudaMemcpyPeer that I have inside an OpenMP parallel region does not complete the data transfer. Any ideas?

I see over 1,000 views, but still no response. Clearly there is a lot of interest; Any ideas, anyone?

helping you to make a bump. Don’t trust the view number. The forum is screwed.