I found one of the Tesla GPUs went bad, and I would like to disable it:
nvidia-smi
Wed Mar 25 10:21:34 2015
±-----------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M2050 Off | 0000:04:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 6MiB / 2687MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla M2050 Off | 0000:05:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 6MiB / 2687MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla M2050 Off | 0000:08:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 6MiB / 2687MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla M2050 Off | 0000:09:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 6MiB / 2687MiB | 0% Default |
±------------------------------±---------------------±---------------------+
I got the following errors in system log:
kernel: NVRM: Xid (PCI:0000:04:00): 58, Edc 00000004
kernel: NVRM: Xid (PCI:0000:04:00): 48, An uncorrectable double bit error (DBE) has been detected on GPU (00 04 00).
kernel: NVRM: Xid (PCI:0000:04:00): 45, Ch 00000001, engmsk 00000100
Is there a way I can disable GPU#0(Bus-ID: 0000:04:00.0) from O/S(RHEL5)?
Thank you!