Linux server unable to recognize GPU

We have the below configuration GPU on our Linux server but it is giving errors:-

±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:84:00.0 Off | 0 |
| N/A 45C P0 55W / 149W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K80 Off | 00000000:85:00.0 Off | 0 |
| N/A 34C P0 71W / 149W | 0MiB / 11439MiB | 99% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Error: -

***WARNING: FOUND MULTIPLE ACCLERATOR PLATFORM DRIVERS:

***WARNING: PLATFORM_CUDA

***WARNING: PLATFORM_OPENCL

***WARNING: USE ENVIRONMENT VARIABLE ABA_ACCELERATOR_TYPE TO SELECT THE
DESIRED PLATFORM TYPE

 GPU SOLVER ACCELERATION UNAVAILABLE. SEE JOB LOG FILE FOR MORE DETAILS

I think the message is quite straight-forward, add
export ABA_ACCELERATOR_TYPE=PLATFORM_CUDA
to your ~/.profile
and open a new shell or logout/login.
Also make sure libcuda is installed in e.g. /usr/lib64/ depending on distro.

Hi thanks for the response! But do you know which profile? user or root?

Either in each user’s profiles that’s running abaqus or systemwide in /etc/profile

Hey we tried doing that but still we are getting the same error. Is there any specific driver that is missing?

cdcvillx141:/home/mphpcadmin # nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Ma ke sure that the latest NVIDIA driver is installed and running.

cdcvillx279:/home/mphpcadmin # nvidia-smi
Mon Mar 15 15:10:55 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:84:00.0 Off | 0 |
| N/A 46C P0 58W / 149W | 0MiB / 11439MiB | 85% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K80 Off | 00000000:85:00.0 Off | 0 |
| N/A 34C P0 77W / 149W | 0MiB / 11439MiB | 97% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Please post the output of
export |grep ABA
to check you properly set the env variable.

Please also post the output of
ls -l /usr/lib64/libcuda*
Which distribution are you using?

Nothing comes up:
cdcvillx035:~ # export|grep ABA
cdcvillx035:~ #

cdcvillx279:/home/mphpcadmin # ls -l /usr/lib64/libcuda*
lrwxrwxrwx 1 root root 12 May 8 2020 /usr/lib64/libcuda.so → libcuda.so .1
lrwxrwxrwx 1 root root 17 May 8 2020 /usr/lib64/libcuda.so.1 → libcuda. so.384.81
-rwxr-xr-x 1 root root 13038712 Sep 2 2017 /usr/lib64/libcuda.so.384.81

cdcvillx141:/home/mphpcadmin # ls -l /usr/lib64/libcuda*
lrwxrwxrwx 1 root root 12 Oct 30 2019 /usr/lib64/libcuda.so → libcuda.so.1
lrwxrwxrwx 1 root root 20 Oct 30 2019 /usr/lib64/libcuda.so.1 → libcuda.so.418.87.01
-rwxr-xr-x 1 root root 16149040 Oct 30 2019 /usr/lib64/libcuda.so.418.87.01

Can you clarify what you mean by distribution? Is it OS?

You have incorrectly set the export.

ok so what should I do to set the export? Can you send some instructions?

So our GPUs are not running on 2 linux servers below, so can let us know which Nvidia package needs to be installed on these to Linux servers:

cdcvillx279:/apps/software/nvidia # dmidecode|grep -i product
Product Name: PowerEdge R730
Product Name: 072T6D
cdcvillx279:/apps/software/nvidia # cat /etc/*release
NAME=“SLES”
VERSION=“12-SP2”
VERSION_ID=“12.2”
PRETTY_NAME=“SUSE Linux Enterprise Server 12 SP2”
cdcvillx141:~ # dmidecode|grep -i product
Product Name: PowerEdge R720
Product Name: 0020HJ
cdcvillx141:~ # cat /etc/*release
NAME=“SLES”
VERSION=“12-SP2”
VERSION_ID=“12.2”
PRETTY_NAME=“SUSE Linux Enterprise Server 12 SP2”

Looking at the libcuda versions, it seems the driver has previously been installed with cuda.
How has this been installed?
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

Can you tell us where would we find this file?

It should have been installed in the standard path when you initially installed the driver.
Doesn’t
sudo nvidia-bug-report.sh
work?

ok it worked now. I have attached the file nvidia-bug-report.log.gz (400.4 KB)

That looks largely unmaintained. A year ago, driver 390.26 was installed using the runfile installer but uninstalled afterwards. No idea how the 384 driver was installed. Please post the output of
sudo zypper search “nvidia*”