Using a T4 with an x99. I have a 1080 and the T4 installed. I started with the 410.48 Cuda toolkit but found that I needed either 410.72 or 410.79 for T4 support. I swapped both of those drivers in but neither show any output with nvidia-smi for the T4.
I updated the pci.ids file and it’s now seeing the card’s information in lspci
02:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev ff)
I checked dmesg and this is in the last section
[ 202.589325] pcieport 0000:00:02.0: AER: Uncorrected (Fatal) error received: id=0010
[ 202.589340] pcieport 0000:00:02.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0010(Receiver ID)
[ 202.589442] pcieport 0000:00:02.0: device [8086:6f04] error status/mask=00000020/00000000
[ 202.589502] pcieport 0000:00:02.0: [ 5] Surprise Down Error (First)
[ 202.589554] pcieport 0000:00:02.0: broadcast error_detected message
[ 202.589559] pci 0000:02:00.0: device has no driver
[ 203.593466] pcieport 0000:00:02.0: Root Port link has been reset
[ 203.593475] pcieport 0000:00:02.0: AER: Device recovery failed
This is a Centos 7.5 system on an x99 board with a 1080 and a T4 installed. I have tried the Cuda toolkit 410.48 and cuda-drivers 410.72 and 410.79.
Thu Jan 10 15:20:55 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 Off | N/A |
| 30% 42C P0 37W / 180W | 0MiB / 8116MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
What do I have to do to get the card to be recognized correctly?