Lstm model can not have more than one optimization profile

linmeng1994 · September 7, 2021, 11:04pm

Description

We are trying to convert a lstm model running on a 4d tensor to tensorrt.We chose to intialize the h0 and c0 with mean of the tensor. That’s why the code is quite long. For more details the pytorch script is also provided. We want to use the multithreading functionality so we need more than 1 optimization profile. but the code gives

[TensorRT] ERROR: 2: [standardEngineBuilder.cpp::makeEngineFromGraph::1288] Error Code 2: Internal Error (Assertion engineRegions.count(it->name) == 0 failed.)

if I add the for loop to add multiple optimization profiles to config.

Environment

Docker Env:nvcr.io/nvidia/tensorrt:21.02-py3
GPU Type: tried both gtx 2080ti and gtx3090
Nvidia Driver Version: 470.57.02
PyTorch Version (if applicable): tried with pytorch1.4 and 1.6 and 1.7 with 1.7 has an no output bug when exporting

Relevant Files

Steps To Reproduce

run

python export2onnx.py

you can skip it since I also provide the onnx file
then just run

python onnx2tensor.py

full error message

for to onnx script there are some warning:

Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  con_h = torch.unsqueeze(con_h, 2).repeat(1, 1, int(feature_h), 1)
/home/agent_m/temp/minimal_case/lstm.py:54: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  con_c = torch.unsqueeze(con_c, 2).repeat(1, 1, int(feature_h), 1)
/home/agent_m/miniconda3/envs/pipeline_env_3/lib/python3.6/site-packages/torch/onnx/symbolic_opset9.py:1668: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model. 
  "or define the initial states (h0/c0) as inputs of the model. ")

for tensorrt script:

[TensorRT] WARNING: onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] WARNING: onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
Completed parsing of ONNX file
onnx2tensorrt.py:28: DeprecationWarning: Use build_serialized_network instead.
  engine = builder.build_engine(network, config=trt_config)
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
[TensorRT] ERROR: 2: [standardEngineBuilder.cpp::makeEngineFromGraph::1288] Error Code 2: Internal Error (Assertion engineRegions.count(it->name) == 0 failed.)
Traceback (most recent call last):
  File "onnx2tensorrt.py", line 39, in <module>
    engine, context = build_engine('onnx_out.onnx')
  File "onnx2tensorrt.py", line 29, in build_engine
    context = engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'

I did int(feature_h) only because I tried to debug myself and tried to figure out if those dynamic numbers are the problems, it turns out they are not.

NVES · September 8, 2021, 4:42am

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!

linmeng1994 · September 8, 2021, 2:51pm

so you mean give up debugging and change to new tools?
Or you believe this bug will disappear on different device?
It has nothing to do with multithreading since there’s other usage for adding multiple optimization profiles.
As for the links. It’s not about multi-stream. It should be okay to add multiple optimization profiles to config and I’ve succesfully done that with another model. For the model I’m working on, if I tweak the code a little bit to get rid of some modules it also works. What I mean is to me it is an unexpected behavior on tensorrt side and I wonder what is so special about the code I provided that it triggered the bug. (repeat to dynamic size? repeat too many times?, combination of view, permute and repeat?) I provided a somewhat minimal example. I already spent whole day to locate the part of the pytorch code that’s problematic and thought about what makes it special, but I may still need help from experts.

The bug is on the engine building stage if I add more than 1 optimization profile to the config as I stated above. And there’s no google result if I search for the assertion on error message.

Thanks!

spolisetty · September 20, 2021, 6:08pm

Hi @linmeng1994,

We could reproduce the same error. Please allow us some time to work on this.

Thank you.

spolisetty · November 29, 2021, 2:16pm

@linmeng1994,

This bug has been fixed on latest TRT version 8.2 GA. Could you please try and let us know if you still face this issue.

Topic		Replies	Views
Optimization profile not set after creating context {mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()} TensorRT	3	1123	January 18, 2023
How to use different profile in tensorrt? TensorRT tensorrt , python	3	1545	July 19, 2022
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	1110	September 29, 2022
TensorRT С++ optimization profile TensorRT tensorrt , opencv , cuda	29	3538	September 9, 2021
Assert error occurred when building engine with mulltiple profiles TensorRT	3	910	May 11, 2020
[executionContext.cpp::executeInternal::652] Error Code 1: Cuda Runtime (an illegal memory access was encountered) \| Cuda failure: 700 TensorRT tensorrt	4	3290	April 11, 2022
ONNX to TensorRT Python module doesn't generate dynamic batch size engine TensorRT tensorrt , cudnn , onnx	2	1194	October 20, 2023
TensorRT optimization profile TensorRT tensorrt , cuda , tensorflow , python , onnx	3	1796	June 2, 2021
Run multiple model(engine) with tensorrt without deepstream TensorRT	1	1199	April 20, 2020
Facing error while trying to convert onnx model using TensorRT Optimisation tool DRIVE AGX Xavier General driveos-dl	10	1620	April 3, 2023