TensorRT engine context use mem

fengweizhong1 · June 16, 2022, 8:34am

Description

I have split onnx models to multi sub onnx model files(use onnx.utils.extract_model api). With every sub onnx model ，I use trtexec to gen a trt engine. Finally, get all trt engines.
Then, I have load and deserialize all engines in my project, and use createExecutionContext api to create multi contexts. The question is, I have found that every engine createExecutionContext api will use 16M mem, so after all contexts be created, N*16M mem is used.
I am confused about:
1、what does tensorrt do when engine createExecutionContext
2、why every context holds 16M mem
3、Is there a way to share context between all engines.

Environment

TensorRT Version:7.2.2.3
GPU Type: V100. 2080ti
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

NVES · June 16, 2022, 9:07am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

fengweizhong1 · June 16, 2022, 9:23am

Thank you for your replay. I have use tensorrt for many years, is familiar with it. Now I want to integrate tensorrt to our deeplearning framework（like onnxruntime）. Our plan is split onnx model with separate operators, for the operator tensorrt support will run on tensorrt，for operator tensorrt doesn’t support will run our native kernel.
In this plan, every operator tensorrt supports will create it’s INetworkDefinition、ICudaEngine、IExecutionContext.
So when model running, there will be lots of ICudaEngines and IExecutionContexts, which use more gpu mem.
I am wonder why create IExecutionContexts use 16M mem, and if possible that all engins share context to reduce mem use.

spolisetty · June 23, 2022, 4:34pm

Hi

Each execution context may hold some activation (persistent/scratch) memory. You can find more information here: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation
During engine building, the log will also tell you how much activation memory the context will use: ex

[06/20/2022-03:15:10] [I] [TRT] Total Activation Memory: 8388608

It depends whether it’s activation memory, or persistent memory. First, please go through the user guide section on memory and diagnose which of these is the case. If it’s activation memory, then you can share it between contexts using createExecutionContextWithoutDeviceMemory(). If it’s persistent memory, the problem is mostly like edge mask tactics, and you can turn this off using setTacticSources().

Thank you.

fengweizhong1 · June 27, 2022, 4:02am

thanks for your suggest.

I have print TRT7 logs when create context. I use gdb to watch nvidia-smi changes.

335	    mEngine = mRuntime->deserializeCudaEngine(trt_model_stream, size, nullptr);
(gdb) n
2022-06-27 20:03:48 | log | INFO: Deserialize required 102589 microseconds.
336	    mContext = mEngine->createExecutionContext();
(gdb) n
2022-06-27 20:03:58 | log | INFO: Allocated persistent device memory of size 901120
2022-06-27 20:03:58 | log | INFO: Allocated activation device memory of size 0
2022-06-27 20:03:58 | log | INFO: Assigning persistent memory blocks for various profiles
338	    int engine_size = mEngine->getDeviceMemorySize();
(gdb) c.
2022-06-27 20:13:04 | [INFO ] | name: Conv_4, engine_size: 0

Before the below create context code execute, gpu mem is 211M, after is 228M. So create context use 17M mem.
mContext = mEngine->createExecutionContext();

getDeviceMemorySize(). // return 0
Allocated persistent device memory of size 901120

Persistent memory + Scratch memory = 901K , is not 17M.

So, is there other mem use when mEngine->createExecutionContext()? I have test that every context_create will use 17M
mem. :(

spolisetty · July 5, 2022, 12:39pm

Hi,

Sorry for the delayed response.
We recommend you to please try on the latest TensorRT version 8.4 GA.
https://developer.nvidia.com/nvidia-tensorrt-8x-download

If you still face this issue, could you please share with us a minimal issue repro script and model, and output logs to try from our end.