I am using CUDA 6 to develop for the Tegra K1 platform. The GPU and CPUs of the K1 are on the same chip and access the same RAM. As I understand, this should make copying data between host and device memory unnecessary. This poses the question:
What is the most efficient way to enable access to some data from both, GPU and CPU, on the K1?
- There is the option to use mapped memory. Does using mapped memory have higher latency than accessing normally allocated device memory on a K1?
- Another option would be to use the new managed memory. Theoretically, it would not need to copy the data at all on the K1. How does it handle this?
The data fits into the memory (no paging required).