Pinned Memory Usage

I have been suggested the use of pinned memory from my question here. However, as I have briefly gone through the Dr. Dobb’s CUDA article about this (, and CUDA documentation, it warns me that using too much pinned memory could degrade my program performance. From the article, it appears that I can use it to extend GPU memory by adding host memory to GPU address space, and also speed up the transfers. What would be your general word of advice on using pinned memory instead of just using normal CUDAmalloc() to allocate memory in the GPU and only transfer data between CPU and GPU as needed (minimizing it)?

Thanks in advance for the comments.