Deep Learning memory optimization

In my application I have to use inference from two Deep Learning models. Both the models need to be loaded in memory so that they can provide continuous inference without having to reload them everytime. Right now I’m able to load only one model on GPU and the program crashes because of OOM error when trying to load the second model (Sometimes I get OOM for the first model itself). How can I load both the models on GPU.

I tried creating Swap memory but it didn’t help

Device: Jetson Nano
Deep Learning framework: Tensorflow


Since Nano only has 4G memory, this will limit you from using too complicated model.
Swap memory can only increase CPU based memory, but in general inference requires GPU-accessible memory.

A possible alternative is to use other frameworks that requires less memory usage, ex. TensorRT.