Pinned memory does not play nice with ctx management

MichaelChampigny · November 6, 2008, 11:47pm

I’d like to allocate and initialize pinned memory on the host side in one thread (using one ctx) and then use that memory to perform transfers from host to device in another thread (of course, using a different context).

In pseudo code:

Thread #0

cuCtxCreate
cuMemAllocHost (buffer)
(initialized buffer with some data)
cuCtxDestroy // What happens to buffer now? It’s context is now gone.

Thread #1

cuCtxCreate
cuMemcpyHtoDAsync (buffer) // Oops. We allocated and initialized this memory in Thread #0 with a different context!
cuCtxDestroy

This resulted in various bad things happening in cuMemcpyHtoDAsync (invalid context).

I tried to use the context management API to pop the context from Thread #0.
In Thread #1, I pushed the (now floating) context onto the context stack but CUDA didn’t seem to like that either.

This second attempt looked something like this just to be clear:

Thread #0

cuCtxCreate
cuMemAllocHost (buffer)
(initialized buffer with some data)
cuPopCurrentContext (thread_0_context)
(save thread_0_context for later use in Thread #1 when we perform the asynchronous copy)
cuCtxDestroy // Oops. Now I just destroyed the floating context. Hmm…maybe I need to attach to thread_0_context to raise its reference count to 2?

Thread #1

cuCtxCreate
cuPushContext (thread_0_context) // Push context from Thread #0
cuMemcpyHtoDAsync (buffer)
cuPopCurrentContext // Pop thread_0_context from stack to get back to the previous context.
cuCtxDestroy

Any ideas whether this approach should work? What I’m trying to do is decouple the initialization of pinned memory in one thread from it’s use (i.e., DMA transfer) in another thread.

It doesn’t appear that the CUDA context management API is quite up to the task. It would be nice to be able to allow pinned memory to work outside of a specific CUDA context.

Any ideas?

Thanks.

tmurray · November 7, 2008, 8:49am

Pinned memory from multiple contexts is a feature we’re working on for a future version of CUDA (that is to say, not 2.1).

Also, you’ve hit the “cuCtxPopCurrent documentation is completely incomprehensible and has no relation to what the function actually does” bug. thread_0_context is going to be NULL in your example code, I bet. If you just pass the context handle as returned from cuCtxCreate in thread #0 to cuCtxPushCurrent in thread #1 after ensuring that you’ve called pop from #0, it should work.

MichaelChampigny · November 7, 2008, 9:36pm

Thanks! The handle is indeed NULL, and yes the context management interface in general is not very well documented. I assume since this strategy should work that the future work you are alluding to involves simplifying the context management interface?

What I need to be able to do is somehow decouple the DMA transfer from the initialization of the memory allocated in Thread #0. Right now, it’s really the CUDA context that “owns” the memory. That makes deferring pinned memory transfers painful.

Thanks for the help!

tmurray · November 7, 2008, 10:47pm

The interface isn’t that complicated, it’s just doesn’t do what it’s supposed to at the moment. I’m not sure if we’re changing the documentation or changing the behavior to do what the documentation says (since I am fairly sure absolutely no one uses it as it behaves now); if anyone has strong opinions either way, jump in.

Topic		Replies	Views
Portable pinned memory deallocation CUDA Programming and Performance	1	1249	January 26, 2010
creating a global context using driver api by default context created using driver api seem to be th CUDA Programming and Performance	12	1770	June 15, 2011
Thread migration API (mis)documentation CUDA Programming and Performance	3	2002	October 9, 2008
CUDA,Context and Threading CUDA Programming and Performance	6	19530	May 29, 2012
CUDA Context and host multi-threading CUDA Programming and Performance	4	14370	June 9, 2010
cudaHostAlloc and thread safety problems with pinned, portable memory CUDA Programming and Performance	2	1814	April 8, 2011
Contexts and cudaMallocHost Same rules? CUDA Programming and Performance	17	11222	November 15, 2008
floating contex problem CUDA Programming and Performance	4	3226	October 21, 2008
cudaMalloced memory cannot be used in other functions memory managment CUDA Programming and Performance	10	7058	May 24, 2010
Does CUDA work with seperate calls coming from different CPU threads? CUDA Programming and Performance	3	3796	September 12, 2009

Pinned memory does not play nice with ctx management

Related topics