CUDA error: an illegal memory access was encountered

We are experiencing an issue with only one of two Volta GPUs on a system. The error they are getting is " CUDA error: illegal memory access was encountered. " Their code runs fine on one GPU [1], but fails on the other GPU [0].

isg@dg19c:~$ uname -a

Linux dg19c 4.15.0-118-generic #119-Ubuntu SMP Tue Sep 8 12:30:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

isg@dg19c:~$ nvidia-smi

±----------------------------------------------------------------------------+

| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |

|-------------------------------±---------------------±---------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla V100-SXM2… On | 00000000:62:00.0 Off | Off |

| N/A 41C P0 43W / 300W | 11MiB / 32510MiB | 0% Default |

±------------------------------±---------------------±---------------------+

| 1 Tesla V100-SXM2… On | 00000000:89:00.0 Off | Off |

| N/A 41C P0 43W / 300W | 11MiB / 32510MiB | 0% Default |

±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

±----------------------------------------------------------------------------+

nvidia-bug-report.log (509 Bytes)