GPUs not being utilized by ABAQUS jobs submitted via Spectrum LSF

We have a Spectrum cluster of SLES 12SP5 compute nodes, some of which have NVIDIA GPUs, mostly Tesla.
We we submit an ABAQUS solve using Spectrum LSF it gives an error about not finding the Devices, but
if I manually run the script it works fine.
This all worked fine until we upgraded to ABAQUS 2020 from 2017.

***WARNING: FOUND MULTIPLE ACCLERATOR PLATFORM DRIVERS:

***WARNING: PLATFORM_CUDA

***WARNING: USE ENVIRONMENT VARIABLE ABA_ACCELERATOR_TYPE TO SELECT THE
DESIRED PLATFORM TYPE

 GPU SOLVER ACCELERATION UNAVAILABLE. SEE JOB LOG FILE FOR MORE DETAILS.

We don’t have any other ‘accelerators’ on these platforms.

Wed Nov 8 19:59:54 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE… On | 00000000:3B:00.0 Off | 0 |
| N/A 32C P0 25W / 250W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Did you set the env variable in the Abaqus 2020 env file?

yes.
mckenns@fralx04:/opt> find ./abaqus -type f -name “*.env” -exec grep CUDA {} ;
os.environ[“ABA_ACCELERATOR_TYPE”]=“PLATFORM_CUDA” # Nvidia GPU enablement

If I run the job from a shell is works O.K. but when submitted from Spectrum LSF it doesn’t.

This is what the log file reports
Run standard

USING ACCELERATOR PLATFORM_CUDA

Error initializing the CUDA Driver NO_DEVICE

WARNING: GPUAcceleration disabled

So how do I get it to identify the DEVICE?
Thanks

Please check this:
https://www.ibm.com/support/pages/unable-use-gpu-acceleration-lsf

I tried that and no joy. Same as before.
I have a malfunctioning ABAQUS job running right now that is not using the GPU.
I ran a bug report. I’ll attach it. It’s binary. I don’t know who can read it, but it might help.
Meanwhile IBM support wants me to try running the script in a bash shell submitted via LSF to collect some data, but I don’t have any free systems right now

nvidia-bug-report.log.gz (1021.9 KB)
Here’s the bug report. Hope it helps

Log looks fine. Did you double-check Abaqus is really using the gpu when run from shell?

Yes, running /usr/bin/nvidia-smi -l I could see the PID for the ‘standard’ ABAQUS process

Please post the output of
ls -l /dev/nvid*
Is it possible to create an LSF job just dumping user and environment? (whoami, export)

I’m trying to respond but I keep getting an error about more then 4 links.
I replied to your email and got the same issue.
How can I respond to this? We’re still having the same problem with GPUs

Here you go. I already went through the steps recommended by our 3rd party support ‘Inceptra’
I attached their doc.

mckenns@fralx03:~> ls -l /dev/nvidia^C
mckenns@fralx03:~> ssh fralx05 ls -l /dev/nvidia*
crw-rw-rw- 1 root video 195, 0 Nov 8 13:28 /dev/nvidia0
crw-rw-rw- 1 root video 195, 255 Nov 8 13:28 /dev/nvidiactl
crw-rw-rw- 1 root root 506, 0 Nov 8 13:32 /dev/nvidia-uvm
crw-rw-rw- 1 root root 506, 1 Nov 8 13:32 /dev/nvidia-uvm-tools

/dev/nvidia-caps:
total 0
cr-------- 1 root root 234, 1 Nov 8 13:28 nvidia-cap1
cr–r–r-- 1 root root 234, 2 Nov 8 13:28 nvidia-cap2

I keep getting an error about ‘new user cannot add more than 4 links’ when I try to reply with the suggested output from ‘bsub etc’

Doesn’t work if you just zip it and attach?

I cannot find out how to attach anything to my replies

Like you attached nvidia-bug-report.log.gz?

I did that by sending an email from the Linux system.
When I try that email address, it rejects me as a duplicate email

There is no ‘attach’ that I can see in the blog.

Horizontal line with upward arrow in the formatting options when creating a post.