A process using CUDA gets stuck, then all others get stuck as well - what do I do?

I have seen other similar questions. here is an example. There are others. I don’t have any further suggestions to add other than what I have shared already. I doubt there is a precise, deterministic, guaranteed method to fix this observation in every imaginable case, other than a reboot. As you can see from that other thread, there may be other processes that need to be killed before the GPU will recover. Until the GPU is recovered, by reboot or some other method, I don’t know of a specific method to guarantee that other processes using that GPU will behave normally.