After deleting all MIG instances and disabling MIG mode and trying to run model training, I am getting the following error.
“RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable”
After rebooting, everything works again.
Does anyone know what’s going and and how I can resolve this without restting GPU or restarting the system?