How to free memory each time I do inference in python?


I have an application that uses 4 yolov5 models in series. The idea was to avoid noise that a multiclass model would get. I even tried to test this theory and found out that 1 model per class has better results than 1 multiclass model.
But this came with a cost. My Nvidia jetson Xavier NX (8 GB RAM) can’t handle 4 models. It seems like that when I do inference with one model, there is reserved space for the inference that is not emptied when a second model is doing the inference. It is accumulating. 8 GB is not enough for 4 models.
Is there any way to empty the memory between inferences for different models?
I’m using 4K images that are resized to 1280x1280 as a blob. This is all done using opencv. This issue usually happens for the first image.

Hi @p.carvalho,
You can try to prune the model or use this reference for quantization, it can help a bit. The other thing is since the 4 models share the input shape, try to recicle and use the same buffer if possible to avoid having multiple input copies. The other thing that you can try but comes at the cost of execution time, is to load a model, do inference and then unload it and so on for the rest of the models. And lastly you can try to migrate it to TensorRT and check how the memory usage behaves if using a different framework. Like over here

Embedded SW Engineer at RidgeRun
Contact us:
Developers wiki:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.