I have split onnx models to multi sub onnx model files(use onnx.utils.extract_model api). With every sub onnx model ，I use trtexec to gen a trt engine. Finally, get all trt engines.
Then, I have load and deserialize all engines in my project, and use createExecutionContext api to create multi contexts. The question is, I have found that every engine createExecutionContext api will use 16M mem, so after all contexts be created, N*16M mem is used.
I am confused about:
1、what does tensorrt do when engine createExecutionContext
2、why every context holds 16M mem
3、Is there a way to share context between all engines.
GPU Type: V100. 2080ti
Nvidia Driver Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
- validating your model with the below snippet
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thank you for your replay. I have use tensorrt for many years, is familiar with it. Now I want to integrate tensorrt to our deeplearning framework（like onnxruntime）. Our plan is split onnx model with separate operators, for the operator tensorrt support will run on tensorrt，for operator tensorrt doesn’t support will run our native kernel.
In this plan, every operator tensorrt supports will create it’s INetworkDefinition、ICudaEngine、IExecutionContext.
So when model running, there will be lots of ICudaEngines and IExecutionContexts, which use more gpu mem.
I am wonder why create IExecutionContexts use 16M mem, and if possible that all engins share context to reduce mem use.
Each execution context may hold some activation (persistent/scratch) memory. You can find more information here: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation
During engine building, the log will also tell you how much activation memory the context will use: ex
[06/20/2022-03:15:10] [I] [TRT] Total Activation Memory: 8388608
It depends whether it’s activation memory, or persistent memory. First, please go through the user guide section on memory and diagnose which of these is the case. If it’s activation memory, then you can share it between contexts using
createExecutionContextWithoutDeviceMemory(). If it’s persistent memory, the problem is mostly like edge mask tactics, and you can turn this off using