I have 3 1080Tis, CUDA 8.0, NVIDIA driver 381.09 with openGL libs switched off.
nvidia-smi is really slow, takes around 2s for update
After a few CUDA dependent runs, like PyTorch/Tensorflow, nvidia-smi fails to load, tensor processing becomes extremely slow and I have to reboot everytime to recover nvidia-smi and the GPU processing speed.
I tried updating to 381.22 and nvidia-smi is still slow.
When is the next release for Ti? What’s the source of this bug? How do I rectify it?
UPDATE: GPU processing hangs even with 381.22
This worked for me under Ubuntu 16.04.2! I ran the command under sudo:
sudo nvidia-persistenced --persistence-mode
12 months after OP I’m having a very similar issue: tensorflow 1.8; cuda 9.0; single 1080Ti; ubuntu 16.04. I’ve tried a few kernel and driver combinations. It seems OK after reboot but grinds to halt after running on the GPU for a few minutes. nvidia-smi -q reports “unknown” or “error” of many parameters and has lost track of the process using the GPU.
Any suggestions or updates on this issue?
nvidia-smi -q produces a lot of “Unknown Error” for many parameters, especially power parameters. Is power management borked for this card?
nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Wed Jun 20 22:31:02 2018
Driver Version : 396.26
Attached GPUs : 1
GPU 00000000:65:00.0
Product Name : GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-d0ad567f-55da-3af4-f46c-f7a7c112ca94
Minor Number : 0
VBIOS Version : 86.02.39.00.5A
MultiGPU Board : No
Board ID : 0x6500
GPU Part Number : N/A
Inforom Version
Image Version : Unknown Error
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x65
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:65:00.0
Sub System Id : 0x120F10DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : Unknown Error
Performance State : P2
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
FB Memory Usage
Total : 11177 MiB
Used : 1 MiB
Free : 11176 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 98 %
Memory : 54 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 48 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : Unknown Error
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : Unknown Error
SM : Unknown Error
Memory : 5005 MHz
Video : Unknown Error
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 5505 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None