Nvidia-smi dmon sm and mem utility both have a sudden decreaset


SM and Memory utility both have a sudden decrease, then they can recover in next second, but it will cause a long time processing latency about 500 miliseconds.

Hello,

Welcome to the NVIDIA Developer forums! You posted in the feedback category, unfortunately, there is no one from support monitoring this section.

Please provide more details of your issue and how it relates to Nvidia products, platforms, etc, and I will move this to the correct forum.

Thanks,
Tom

Thanks for your suggestions. We use Nvidia Tesla T4 GPU, PCIe x 8, everything goes well if we run gpuburn for stress test. But if run some application on them, we will occasionally get the latency, then see the utility monitor show as the screenshot. It looks like one processing frame drop. We don’t know it is related to user application or something else, hope we can get some help here for troubleshooting.
In our application, one server take two Tesla T4 PCIe x8 (Gen3) per GPU, but the bandwidth looks enough: average 1,200 MB/s, peak value is 1,500 MB/s

Thanks. I am moving this to the Tesla Boards category for better visibility.