Reccomended way of managing contexts in the driver API

Every context you create on a device uses up space.
If it were me, I would not want to ever have more than one context per device.

If you have a library that expects to interact with an application that is using the driver API, then I would just either require a context to be explicitly handed to you, and use that only, or else have some notion of a context stack in a proper state.

If you have a library that expects to interact with an application that is using the runtime API, then I would say, simply, do nothing. Don’t create your own contexts. Expect any device state (e.g. pointers/allocations/resources handed to your library) to be valid in the primary context, and you have nothing explicitly to manage.

Multi-threading should be orthogonal to this. CUDA contexts have been shareable amongst threads for a long time now.

You should never destroy the primary context on a device if you are using or might be using the runtime API, unless you simply intend to exit right then and there or are fine with weird errors (or otherwise have a solid plan). Just because the runtime API might create a new context doesn’t mean that any previously established state will be automatically reestablished in the “new” primary context. This can lead to all sorts of hilarity for an unsuspecting library user.

I don’t really know the objectives of your wrapper (other than, I guess, to make CUDA available to Rust applications), but its not evident to me why you would start down that road on the driver API, unless there was a pre-existing notion that the thing you wanted to support was going to be creating driver API contexts and whatnot. I don’t see how you get there if Rust is your starting point. But I know very little about Rust.