We have the below configuration GPU on our Linux server but it is giving errors:-
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:84:00.0 Off | 0 |
| N/A 45C P0 55W / 149W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K80 Off | 00000000:85:00.0 Off | 0 |
| N/A 34C P0 71W / 149W | 0MiB / 11439MiB | 99% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
Error: -
***WARNING: FOUND MULTIPLE ACCLERATOR PLATFORM DRIVERS:
***WARNING: PLATFORM_CUDA
***WARNING: PLATFORM_OPENCL
***WARNING: USE ENVIRONMENT VARIABLE ABA_ACCELERATOR_TYPE TO SELECT THE
DESIRED PLATFORM TYPE
GPU SOLVER ACCELERATION UNAVAILABLE. SEE JOB LOG FILE FOR MORE DETAILS
I think the message is quite straight-forward, add
export ABA_ACCELERATOR_TYPE=PLATFORM_CUDA
to your ~/.profile
and open a new shell or logout/login.
Also make sure libcuda is installed in e.g. /usr/lib64/ depending on distro.
cdcvillx141:/home/mphpcadmin # nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Ma ke sure that the latest NVIDIA driver is installed and running.
cdcvillx279:/home/mphpcadmin # nvidia-smi
Mon Mar 15 15:10:55 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:84:00.0 Off | 0 |
| N/A 46C P0 58W / 149W | 0MiB / 11439MiB | 85% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K80 Off | 00000000:85:00.0 Off | 0 |
| N/A 34C P0 77W / 149W | 0MiB / 11439MiB | 97% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
Looking at the libcuda versions, it seems the driver has previously been installed with cuda.
How has this been installed?
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.
That looks largely unmaintained. A year ago, driver 390.26 was installed using the runfile installer but uninstalled afterwards. No idea how the 384 driver was installed. Please post the output of
sudo zypper search “nvidia*”