Multi model inference - Swap GPU memory

hanoch.kremer · February 18, 2021, 3:04pm

Description

HI, I’ve converted few CNN models via TF-TRT (TRT=7.2.2.3; TF=2.4.1; CUDA=11.1, python API) to run them in pipeline. Im limited by GPU (2080i - 11G mem) memory rather than throughput. TF2.4 preallocate memory per model and hence doesn’t leave “space” to load/release models in RT. Is there a OS like way to swap/copy a model memory from the GPU out to a near fast DDR(slow latency copy) to release GPU memory for other models? Of course copy back the model when needed. I’ve tried NVIDIA MPS in EXCLUSIVE and DEFAULT mode but it didn’t perform well and crahsed when running 4xMobileNetV2-models (1.5G GPU mem each) + segmenattion model (4.5G)
Thanks,
Hanoch

Environment

python/TF 2.4
TensorRT Version: 7.2.2.3
GPU Type: 2080i
Nvidia Driver Version:
CUDA Version: 11.1
CUDNN Version:
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6x
TensorFlow Version (if applicable): 2.4x
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

NVES · February 18, 2021, 3:07pm

Hi, Request you to share your model and script, so that we can help you better.

Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!

spolisetty · February 19, 2021, 6:50am

Hi @hanoch.kremer,

Looks like this query is already posted. We request you to respond in the below thread.

Thank you.

hanoch.kremer · February 19, 2021, 11:50am

Hi I was trying to ask that on the same thread but answered that only tf-trt questions are being addressed. What do you recommend I do?

Topic		Replies	Views
Multi model inference - Swap GPU memory TensorRT tensorrt , tensorflow	10	1428	February 23, 2021
How can I release GPU memory without terminating the execution process TensorRT tensorrt , python	2	1713	June 10, 2022
Switching tensorrt compiled engines without reloading from file TensorRT tensorrt , pytorch , cudnn , inception	3	19	March 4, 2025
TF-TRT does not free up memory when converting a SavedModel TensorRT	2	365	August 24, 2023
Manage TensorRT GPU memory conversion usage TensorRT tensorrt , tensorflow , ubuntu	3	2621	April 7, 2021
TF-TRT5: How to run tensorflow-tensorrt inferences with multiple GPUs TensorRT	10	3580	September 3, 2019
New TensorRT Model occupying more GPU Memory as compared to older version TensorRT tensorrt , tensorflow , gpu	8	1985	August 20, 2021
Using inputs to the model that are already in the device TensorRT tensorrt , cudnn	2	19	March 11, 2025
GPU memory leak when using tensorrt with onnx model TensorRT tensorrt	4	2023	January 13, 2021
Whether the GPU's memory resources can be set in TensorRT ？ TensorRT	1	484	October 28, 2020

Multi model inference - Swap GPU memory

Description

Environment

Relevant Files

Steps To Reproduce

Related topics