Control cuda runtime context destruction

For our project, we made a shared library used by Node.js with CUDA in it.

Everything works fine for running, but it’s when the app closes that it’s tricky. We want to properly destroy some objects that own memory allocated on the GPUs, but it crashes because the contexts are already destroyed.

We try using the cuDevicePrimaryCtxRetain which should increase the context count. And at the end, using cuDevicePrimaryCtxRelease. But even that doesn’t really work.

And we know our system to properly deallocate everything works “fine” because the tests are all in an executable and we can see the destruction being done in the right order when the app closes.

Is there anything we can do to control this properly?


In your shared library, do you have objects created at global (or file) scope whose constructors or destructors utilize CUDA runtime API calls? If so, that is just a bad idea, and I don’t think there is a solution for it, other than to move those objects to non-global scope.

If you are having trouble with objects created a function scope, that is a different issue and probably one that can be managed correctly by having the objects go out of scope correctly.

We actually created a base class to declare dependencies between “global” objects that are wrapped inside shared pointers.

That shared pointer can be global, but if it represent an object needed by others, these others will own a reference to it. So its count will be non-zero until all the ones that depend on it are deallocated.

And the “retain” is part of such object that is one of the first thing being created and a lot of other object has dependencies on it.

That’s the same system we have for the tests and it works perfectly, making sure that the destruction is in the right order.

But I might not be sure it’s totally a cuda problem, based on other stuff that I read, Node.js could be the one closing too fast and not waiting for anything else. But, being able to prevent any context destruction until we actually want it could be useful also.

I’m unaware of any methods to control the CUDA runtime context destruction. If you want to fiddle around with driver API methods, have at it, but it’s not obvious to me that they give any different level of control. Once the host process begins to disintegrate at the point of process termination, the CUDA runtime (or driver) context will disappear in some unspecified fashion. I know of no method to control it, and I’m not sure what sort of control you expect to exert via programmer activity as a host process is disappearing.

It’s not really clear to me what you mean by “crash” or what is going on in your case.

The only real caveats I am aware of are the ones I mentioned already. And the principal hazard at destructor time is just the CUDA runtime error status. When you try to perform cuda runtime API calls while a process/context is disintegrating, then you get (IMO) a relatively benign “sorry” message from the CUDA runtime. In my view this could safely be ignored, but I suppose that also depends on the specifics of your error handler. Again, I don’t know what you mean by “crash”, but the CUDA runtime does not exert segfaults or other such undesirable behavior, in my experience, in situations like this.

Thanks for your quick responses!