We are experiencing a performance drop of NN model inference after inactivity.
After the model is loaded and we do inferences in a loop using TensorRT (via ONNX RT), we are able to achieve a performance of 20ms. But after for example 1 second of inactivity the performance drops to 163 ms. Next immediate inference is faster and eventually we got back to the 20ms.
Something similar is happening with CUDA execution provider.
We are currently using MODE_15W_6CORE mode on Jetson NX Xavier:
mlipovsky@srv-ipjetson-3n:~$ /usr/sbin/nvpmodel -q
NV Fan Mode:quiet
NV Power Mode: MODE_15W_6CORE
Which mode should we use or what are the options to eliminate the performance drop after GPU inactivity?