1- Wrt to cuda 4.0 , is there any significant benefit of using pinned-mapped memory on latest GPUs?
2-How can zero copy be much faster as compared to usual cudamemcpy, when the data in both the cases has to travel through same pci-e, experiencing same latency. Is it because of reduced over all overhead during function call (cudamemcpy)?