TensorRT engine context use mem

Description

I have split onnx models to multi sub onnx model files(use onnx.utils.extract_model api). With every sub onnx model ,I use trtexec to gen a trt engine. Finally, get all trt engines.
Then, I have load and deserialize all engines in my project, and use createExecutionContext api to create multi contexts. The question is, I have found that every engine createExecutionContext api will use 16M mem, so after all contexts be created, N*16M mem is used.
I am confused about:
1、what does tensorrt do when engine createExecutionContext
2、why every context holds 16M mem
3、Is there a way to share context between all engines.

Environment

TensorRT Version:7.2.2.3
GPU Type: V100. 2080ti
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Thank you for your replay. I have use tensorrt for many years, is familiar with it. Now I want to integrate tensorrt to our deeplearning framework(like onnxruntime). Our plan is split onnx model with separate operators, for the operator tensorrt support will run on tensorrt,for operator tensorrt doesn’t support will run our native kernel.
In this plan, every operator tensorrt supports will create it’s INetworkDefinition、ICudaEngine、IExecutionContext.
So when model running, there will be lots of ICudaEngines and IExecutionContexts, which use more gpu mem.
I am wonder why create IExecutionContexts use 16M mem, and if possible that all engins share context to reduce mem use.

Hi

Each execution context may hold some activation (persistent/scratch) memory. You can find more information here: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation
During engine building, the log will also tell you how much activation memory the context will use: ex

[06/20/2022-03:15:10] [I] [TRT] Total Activation Memory: 8388608

It depends whether it’s activation memory, or persistent memory. First, please go through the user guide section on memory and diagnose which of these is the case. If it’s activation memory, then you can share it between contexts using createExecutionContextWithoutDeviceMemory(). If it’s persistent memory, the problem is mostly like edge mask tactics, and you can turn this off using setTacticSources().

Thank you.