Linux server unable to recognize GPU

yogendra.chaudhary · March 11, 2021, 9:07pm

We have the below configuration GPU on our Linux server but it is giving errors:-

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Error: -

***WARNING: FOUND MULTIPLE ACCLERATOR PLATFORM DRIVERS:

***WARNING: PLATFORM_CUDA

***WARNING: PLATFORM_OPENCL

***WARNING: USE ENVIRONMENT VARIABLE ABA_ACCELERATOR_TYPE TO SELECT THE
DESIRED PLATFORM TYPE

 GPU SOLVER ACCELERATION UNAVAILABLE. SEE JOB LOG FILE FOR MORE DETAILS

generix · March 11, 2021, 11:21pm

I think the message is quite straight-forward, add
export ABA_ACCELERATOR_TYPE=PLATFORM_CUDA
to your ~/.profile
and open a new shell or logout/login.
Also make sure libcuda is installed in e.g. /usr/lib64/ depending on distro.

yogendra.chaudhary · March 12, 2021, 9:10pm

Hi thanks for the response! But do you know which profile? user or root?

generix · March 12, 2021, 10:20pm

Either in each user’s profiles that’s running abaqus or systemwide in /etc/profile

yogendra.chaudhary · March 15, 2021, 6:44pm

Hey we tried doing that but still we are getting the same error. Is there any specific driver that is missing?

yogendra.chaudhary · March 15, 2021, 7:20pm

cdcvillx141:/home/mphpcadmin # nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Ma ke sure that the latest NVIDIA driver is installed and running.

yogendra.chaudhary · March 15, 2021, 7:20pm

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

generix · March 15, 2021, 7:40pm

Please post the output of
export |grep ABA
to check you properly set the env variable.

generix · March 15, 2021, 7:42pm

Please also post the output of
ls -l /usr/lib64/libcuda*
Which distribution are you using?

yogendra.chaudhary · March 15, 2021, 7:50pm

Nothing comes up:
cdcvillx035:~ # export|grep ABA
cdcvillx035:~ #

yogendra.chaudhary · March 15, 2021, 7:53pm

cdcvillx279:/home/mphpcadmin # ls -l /usr/lib64/libcuda*
lrwxrwxrwx 1 root root 12 May 8 2020 /usr/lib64/libcuda.so → libcuda.so .1
lrwxrwxrwx 1 root root 17 May 8 2020 /usr/lib64/libcuda.so.1 → libcuda. so.384.81
-rwxr-xr-x 1 root root 13038712 Sep 2 2017 /usr/lib64/libcuda.so.384.81

cdcvillx141:/home/mphpcadmin # ls -l /usr/lib64/libcuda*
lrwxrwxrwx 1 root root 12 Oct 30 2019 /usr/lib64/libcuda.so → libcuda.so.1
lrwxrwxrwx 1 root root 20 Oct 30 2019 /usr/lib64/libcuda.so.1 → libcuda.so.418.87.01
-rwxr-xr-x 1 root root 16149040 Oct 30 2019 /usr/lib64/libcuda.so.418.87.01

yogendra.chaudhary · March 15, 2021, 7:56pm

Can you clarify what you mean by distribution? Is it OS?

generix · March 15, 2021, 8:11pm

You have incorrectly set the export.

yogendra.chaudhary · March 15, 2021, 8:15pm

ok so what should I do to set the export? Can you send some instructions?

yogendra.chaudhary · March 15, 2021, 8:59pm

So our GPUs are not running on 2 linux servers below, so can let us know which Nvidia package needs to be installed on these to Linux servers:

cdcvillx279:/apps/software/nvidia # dmidecode|grep -i product
Product Name: PowerEdge R730
Product Name: 072T6D
cdcvillx279:/apps/software/nvidia # cat /etc/*release
NAME=“SLES”
VERSION=“12-SP2”
VERSION_ID=“12.2”
PRETTY_NAME=“SUSE Linux Enterprise Server 12 SP2”
cdcvillx141:~ # dmidecode|grep -i product
Product Name: PowerEdge R720
Product Name: 0020HJ
cdcvillx141:~ # cat /etc/*release
NAME=“SLES”
VERSION=“12-SP2”
VERSION_ID=“12.2”
PRETTY_NAME=“SUSE Linux Enterprise Server 12 SP2”

generix · March 15, 2021, 10:10pm

Looking at the libcuda versions, it seems the driver has previously been installed with cuda.
How has this been installed?
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

yogendra.chaudhary · March 15, 2021, 10:43pm

Can you tell us where would we find this file?

generix · March 15, 2021, 10:45pm

It should have been installed in the standard path when you initially installed the driver.
Doesn’t
sudo nvidia-bug-report.sh
work?

yogendra.chaudhary · March 15, 2021, 10:53pm

ok it worked now. I have attached the file nvidia-bug-report.log.gz (400.4 KB)

generix · March 16, 2021, 9:16am

That looks largely unmaintained. A year ago, driver 390.26 was installed using the runfile installer but uninstalled afterwards. No idea how the 384 driver was installed. Please post the output of
sudo zypper search “nvidia*”

Topic		Replies	Views
NVIDIA driver is not loaded. Ubuntu 18.10 Linux	310	129818	February 14, 2024
CUDA 10 installation problems on Ubuntu 18.04 CUDA Setup and Installation	24	94585	December 11, 2020
"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" Ubuntu 16.04 CUDA Setup and Installation	79	371528	March 19, 2021
OpenGL, NVIDIA and Ubuntu 14.04 issues Linux	28	17382	September 22, 2017
'No devices were found' after installing cuda 11.02 on Ubuntu 20.04 for RTX3080 Linux cuda , ubuntu , driver	19	12644	July 31, 2021
bandwidthTest example throws cudaErrorCallRequiresNewerDriver error when launched via nv-nsight-cu-cli Nsight Compute linux , driver	17	1323	February 9, 2024
Install Problem CUDA Programming and Performance	32	12706	December 17, 2009
not able to update Tesla P100 driver 384 to 418 Linux	119	5146	November 12, 2019
Nvidia process not running Linux	25	2854	December 31, 2021
Followed guide NVIDIA CUDA Installation Guide for Linux, failing at driver install CUDA Setup and Installation cuda , ubuntu	1	1527	October 27, 2020

Linux server unable to recognize GPU

Related topics