We have a Spectrum cluster of SLES 12SP5 compute nodes, some of which have NVIDIA GPUs, mostly Tesla.
We we submit an ABAQUS solve using Spectrum LSF it gives an error about not finding the Devices, but
if I manually run the script it works fine.
This all worked fine until we upgraded to ABAQUS 2020 from 2017.
***WARNING: FOUND MULTIPLE ACCLERATOR PLATFORM DRIVERS:
***WARNING: PLATFORM_CUDA
***WARNING: USE ENVIRONMENT VARIABLE ABA_ACCELERATOR_TYPE TO SELECT THE
DESIRED PLATFORM TYPE
GPU SOLVER ACCELERATION UNAVAILABLE. SEE JOB LOG FILE FOR MORE DETAILS.
We don’t have any other ‘accelerators’ on these platforms.
Wed Nov 8 19:59:54 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE… On | 00000000:3B:00.0 Off | 0 |
| N/A 32C P0 25W / 250W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
I tried that and no joy. Same as before.
I have a malfunctioning ABAQUS job running right now that is not using the GPU.
I ran a bug report. I’ll attach it. It’s binary. I don’t know who can read it, but it might help.
Meanwhile IBM support wants me to try running the script in a bash shell submitted via LSF to collect some data, but I don’t have any free systems right now
I’m trying to respond but I keep getting an error about more then 4 links.
I replied to your email and got the same issue.
How can I respond to this? We’re still having the same problem with GPUs