Nvidia-smi dmon sm and mem utility both have a sudden decreaset

SM and Memory utility both have a sudden decrease, then they can recover in next second, but it will cause a long time processing latency about 500 miliseconds.


Thanks for your suggestions. We use Nvidia Tesla T4 GPU, PCIe x 8, everything goes well if we run gpuburn for stress test. But if run some application on them, we will occasionally get the latency, then see the utility monitor show as the screenshot. It looks like one processing frame drop. We don’t know it is related to user application or something else, hope we can get some help here for troubleshooting.
In our application, one server take two Tesla T4 PCIe x8 (Gen3) per GPU, but the bandwidth looks enough: average 1,200 MB/s, peak value is 1,500 MB/s

