Pinned memory & D2H bandwidth on 1.3 HW

Romain_DOLBEAU · September 5, 2008, 7:22am

Hi all,

I’ve a problem with D2H bandwidth on 1.3 HW, using pinned memory.

On all other hardware (1.0, 1.1), pinned memory improves bandwidth drastically both ways (H2D & D2H). On my 1.3 HW, it only improves H2D, D2H doesn’t change with synchronous copy only. This can be seen with the bandwidthTest binary.

If I replace cudaMemcpy by cudaMemcpyAsync with a non-zero thread (both to the pinned memory), w/o using any concurrency, I get a 2x boost in bandwidth. The same test on 1.1 or 1.0 HW doesn’t improve the performance.

Apparently, on 1.3 HW, cudaMemcpy and cudaMemcpyAsync with a zero thread don’t take advantage of the pinned memory. Or am I missing something?

Topic		Replies	Views
bandwidthTest anomaly! CUDA Programming and Performance	4	10911	July 31, 2009
Strange results with pinned memory Pinned memory stopped "working" CUDA Programming and Performance	2	4409	November 19, 2007
Weird pageable <-> pinned memory performance CUDA Programming and Performance	6	3025	June 10, 2009
cudaMemcpy half bandwidthTest --memory=pinned ftfm CUDA Programming and Performance	9	10996	October 16, 2010
cudaMemcpyAsync H2D and D2H overlap CUDA Programming and Performance	2	5646	November 25, 2009
Low Memory Throughput (D2H) CUDA Programming and Performance	8	2362	May 7, 2014
Best solution for maximizing bandwidth? More then 5.7G H->D bandwidth except Tesla CUDA Programming and Performance	24	11241	December 26, 2008
Memory bandwidth too high? CUDA Programming and Performance	0	3578	December 4, 2007
lopsided bandwidthTest: D->H is 3X slower than H->D CUDA Programming and Performance	0	2160	June 3, 2009
Strange bandwidthTest results with new hardware Lower, and asymetric H->D, D->H CUDA Programming and Performance	18	26371	February 8, 2010

Pinned memory & D2H bandwidth on 1.3 HW

Related topics