CUDA 8: Uniform-memory overlapped host-device copies for Maxwell?

bbridgwater · September 3, 2016, 2:25am

I’m in process of writing a deep learning library with cuDNN support, and would like to know what’s the most performant way to overlap host-device transfers and kernel execution in CUDA 8 on a Maxwell-based card…

I gather that for Pascal-based cards there’ll be cudaMemPrefetchAsync, but what about for Maxwell? Do those still have to use page-locked host memory and cudaMempyAsync, or will there be any support for overlapped copies using uniform memory instead?

Robert_Crovella · September 3, 2016, 2:51am

What is uniform memory? Do you mean unified memory?

bbridgwater · September 3, 2016, 3:49am

Oops! Yes.

Robert_Crovella · September 3, 2016, 3:09pm

Today’s implementation of UM (Unified Memory) transfers managed data at one of two points: kernel launch, and the cudaDeviceSynchronize() call after a kernel launch. Since the runtime handles it, its harder for you as a programmer to precisely control overlap of copy and compute.

Doing things manually using cudaMemcpyAsync and traditional methods still gives you the most control.

bbridgwater · September 6, 2016, 1:21pm

OK, thanks!

On a separate note, I’ve been experimenting with the copy-related parts of the CUDA API, and have found that cudaStreamAttachMemAsync(stream, …cudaMemAttachSingle) method works when stream is cudaStreamLegacy, but silently fails (memory not attached) when passing cudaStreamDefault or cudaStreamPerThread. Not sure if this is a bug, or as intended… I couldn’t find any mention of the intended behavior in the documentation. I would have expected it to work, but with the effective stream being the appropriate default stream (i.e current thread’s default stream when passing cudaStreamPerThread).

Robert_Crovella · September 6, 2016, 6:30pm

It could be a CUDA bug, or maybe you’ve made a mistake. Off the top of my head, I don’t know why cudaStreamAttachMemAsync semantics would vary based on the default stream behavior (after all, you are specifying a stream…) but I haven’t investigated it and there’s any number of things that might impact it that don’t immediately occur to me. I don’t generally spend any time on reported issues of this nature unless OP provides a suitable short, complete reproducer code. Even then, no guarantees (see below).

If you’re convinced that something is a defect in CUDA, and can generate a short, complete demonstration of it, the usual advice is to file a bug at developer.nvidia.com

You’re welcome to discuss it here, of course, but in a community situation there are no guarantees that:

anyone will read it
anyone will think about it
anyone will try to do something about it
anyone will file a bug on your behalf

Topic		Replies	Views
cudaStreamAttachMemAsync behavior questions GPU-Accelerated Libraries	0	1675	September 19, 2016
Pascal & capabilities 6.0 show cudaDevAttrConcurrentManagedAccess is 0 CUDA Programming and Performance	15	1377	December 27, 2018
Problem regarding data transfer overlap between multiple asynchronous streams CUDA Programming and Performance	8	800	September 11, 2016
Memory copy/set async to kernel execution in different stream CUDA Programming and Performance	5	1082	December 15, 2022
Asynchronous kernel execution and memory not overlapping using CUDA stream! CUDA Programming and Performance	3	888	July 7, 2017
How to Overlap Data Transfers in CUDA C/C++ Technical Blog	23	2242	January 18, 2023
Asynchronous data transfer CUDA Programming and Performance	8	7084	May 15, 2008
With unified memory, is there a way to overwrite data that was last used on host, on the device, without causing page faults? CUDA Programming and Performance	2	621	May 5, 2022
Concurrent Kernel Execution / Memory Transfer We can't get it to work... CUDA Programming and Performance	5	4013	March 21, 2009
async memcopy/kernel from different contexts overlaping operations from different contexts.. CUDA Programming and Performance	9	2949	December 18, 2008

CUDA 8: Uniform-memory overlapped host-device copies for Maxwell?

Related topics