I’m having some stability issues with a server. I’m logging nvidia-smi and seeing 100% pviol on one of the GPU boards (A4000). What does it mean?
# gpu pwr gtemp mtemp sm mem enc dec pviol tviol
# Idx W C C % % % % % bool
0 8 33 - 0 0 0 0 0 0
1 6 35 - 0 0 0 0 0 0
2 7 34 - 0 0 0 0 0 0
0 8 33 - 0 0 0 0 100 0
1 7 35 - 0 0 0 0 0 0
2 7 34 - 0 0 0 0 0 0