cudaMemcpyPeer across OpenMP threads I want to copy from one thread/gpu to another using cudaMemcpyP


Can the function “cudaMemcpyPeer” be called from within an OpenMP parallel region? Does it need to be placed within a #pragma omp single region?


Well, I got some code to compile, but the cudaMemcpyPeer that I have inside an OpenMP parallel region does not complete the data transfer. Any ideas?

