Hi all,
Our partner is setting up NVIDIA Telsa T4 on HPE DL380 Gen 10 and is having problem. Would you provide us some pointers please?
The setup is Tesla T4 x 7 on DL380 and 3 of Tesla T4 are recognized by NVIDIA driver, but the rest are not.
$ nvidia-smi
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:12:00.0 Off | 0 |
| N/A 65C P0 29W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Tesla T4 Off | 00000000:13:00.0 Off | 0 |
| N/A 73C P0 33W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 Tesla T4 Off | 00000000:37:00.0 Off | 0 |
| N/A 70C P0 31W / 70W | 0MiB / 15109MiB | 4% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
All 4 are recognized on the OS side.
$ lspci | grep -i nvidia
12:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
13:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
37:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
86:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
af:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
b0:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
d8:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
Do we have to install anything else? We used NVIDIA driver downloaded from below.
We installed the following driver.
DATA CENTER DRIVER FOR LINUX X64
バージョン: 460.32.03
リリース日: 2021.1.19
オペレーティングシステム: Linux 64-bit
CUDA Toolkit: 11.2
言語: Japanese
ファイルサイズ: 169.84 MB
OS: Ubuntu 20.04.2 LTS
Thank you for your support.