zero-copy pinned memory and cuda 4.0

1- Wrt to cuda 4.0 , is there any significant benefit of using pinned-mapped memory on latest GPUs?

2-How can zero copy be much faster as compared to usual cudamemcpy, when the data in both the cases has to travel through same pci-e, experiencing same latency. Is it because of reduced over all overhead during function call (cudamemcpy)?

pl help me understand.

Here is a direct question. How zero-copy work?