Need to disable GPU with error in RHEL7. Can you please give me advice which method is best?
I have tried disable by nvidia-smi -i 2 -c 2 does not help for persistent.
After echo 1 > /sys/bus/pci/devices/0000:07:00.0/remove or echo 0 > /sys/bus/pci/devices/0000:07:00.0/enable nvidia-smi works slowly or even hangs.
nvidia-smi drain -h
and alternatively the env variable
Thank you for answer!
Will try nvidia-smi drain -h.
I cant use CUDA_VISIBLE_DEVICES because we using HPC cluster here and unfortunately there is bug for detecting problematic GPU. And there is no possibilities to disable gpu in cluster manualy.
All works! Thanks again!
p.s. commands can be found in How to turn off specific GPU?