GPU slows down unexpectedly

We have an application taking data from a camera and processing it in real time through the gpu to render a scene. The gpu is an Nvidia RTX 3000 in a Lenovo laptop T15.

Our application is started and all goes well for a first session of rendering, FPS is tracking at 30, gpu power is at 70W, cpu is around 50%. One application session lasts for a few minutes and results are fully rendered in real time.

Then we initiate a second application session and the FPS plummets and rendering lags which is unacceptable for our application (in the health space). Power drops to 30W and the gpu appears to not be loaded at all neither is the cpu.

We tried moving part of the initialization code to when we detect the beginning of the session, it works better and let us handle a few sessions in a row, but eventually the behavior reappears after a while.

Temperature does not seem to be involved here so we do not think the gpu gets throttled for excess temperature.

any suggestion on where to look for and how to get a more deterministic behavior ? Anything specific we could reset between sessions?

thanks a lot for any help.

One way to be sure, would be to concurrently run :

nvidia-smi dmon -d 2 -s pucv

which will give you power and temperature readings and any violations thereof, at two second intervals.