We are facing some GPU hardware thermal problems and we need some hardware support or advice/explanation/suggestion/anything.
Tesla V100 GPU thermal causing shutdown even it’s doing nothing… (BTW, We have 2 Tesla V100 and tested both)
When we power on our workstation (server), Tesla V100 instantly increases to above 90 C (Without any reason)!
And we also tried 1 power cable with a splitter and 2 separate power cables for Tesla V100 (Same problem).
But when we try GeForce RTX 2070, there is no such kind of problem.
We want to solve these problems immediately. We are using this workstation for Deep Learning.
So can you suggest a way to solve this problem?
Our workstation specs:
- Mainboard: ASUS C621E SAGE
- Power supply: Seasonic PRIME 1300 Platinum - SSR-1300PD Full Platinum
- CPU: 2x Intel Xeon Scalable Gold 6230 Processor
- FAN: 2x NOCTUA NH-U12S DX-3647
- RAM: 8x Samsung DDR4 32GB PC4-21300
- SSD: Samsung 970 EVO Plus series 2TB M.2
- Case: 3RSYS T1000
- OS: Ubuntu 18.04
- Driver Version: 455.45.01
- Others: 24pin + 8 pin power cables, USB mouse, keyboard, monitor