Builder.build_cuda_engine returns None with multi-outputs model

Description

Hi, guys, I am trying to print hidden layers’ outputs to verify the correctness of them. I use a internal library to convert mxnet model to tensorrt model dirctly. However, tensorrt cannot build engine when I marked some specific layers as outputs. There are the logs:

[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 830 (MiB)                                                           
[TensorRT] ERROR: 2: [optimizer.cpp::match::2076] Error Code 2: Internal Error (Assertion !n23->cost.empty() failed.)                                      
Engine:  <class 'NoneType'>                                                                                                                                 
Trt net:  <tensorrt.tensorrt.INetworkDefinition object at 0x7fa133fcd7f0>                                                                                   
Traceback (most recent call last):                                                                                                                            
File "trt8_explicit_int8.py", line 153, in <module>                                                                                                           
trt_int8_explicit()                                                                                                                                       
File "trt8_explicit_int8.py", line 117, in trt_int8_explicit                                                                                                  
mod.build_engine(                                                                                                                                         
File "/root/.tspkg/lib/python3.8/site-packages/trtplus/v2/inference.py", line 523, in 
build_engine                                                            
ctx = engine.create_execution_context()                                                                                                                 
AttributeError: 'NoneType' object has no attribute 'create_execution_context' 

The log shows there is an internal assertion error at optimizer.cpp::match::2076…

Environment

TensorRT Version: 8.1
GPU Type: 2080Ti
Nvidia Driver Version: 460.84
CUDA Version: 11.2
CUDNN Version: 11.4
Operating System + Version: ubuntu 20.04 LTS
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
MXNet Version (if applicable): 1.6.0
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-priorities
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!