Today, I did a bandwidthTest example on my machine and graphic device, But I little bit confused.
My mother board has PCI-E gen2 x 16 and my Graphic card is GTX260.
Thus, the expected bandwidth between host and device is over 5GB/s.
However the result is not, it was around 2GB/s
I found --memory=pinned option, and i got the 5.5GB/s result.
I searched what pinned memory is and I’ve just found a reply in this forum, it’s below
If the above is true, why do we always allocate host memory through cudaMallocHost?
I think it can be faster than pageable memory transfer.
[b]And, how much pinned memory can i allocate?
does it depend on Graphic device or Host(OS?)?[/b]
Would you tell me?
pinned (non-pageable) memory cannot be paged out by the OS. When you allocate too much pinned memory, your computer may become unstable. That is why it is up to the programmer to choose which memory he allocates as pinned memory.
On Core i7, the host memory bandwidth is much higher, there the difference between pinned and pageable is much lower.
There is an additional issue related to pageable vs non-pageable memory, that I would happy to have some insight from NVIDIA people. According to Cuda 2.0 documentation Async memory transfers between host and device are supported only for non-pageable (pinned) memory allocated with cudaMallocHost. There is any chance that new Cuda releases would support such Async memory transfers also for standard pageable memory?