from some documents,I know I should use pinned memory with “write combined” flag for transfering data from CPU to GPU,but which type should I use to transfer data from GPU to CPU? Is it pinned memory without “write combined” flag? Should I use “cuMemcpyAsync()” to copy data?
There are occasions to use pinned memory and write combined one. it is difficult to say the one is better than the other. What you have to do is overlap data transfers using streams. Please refer to Chapter 11 of CUDA by example.