Description
I tried to convert a Faster-RCNN model into TensorRT engine (torch → onnx → trtexec trt) for Jetson TX2 NX. I cross-compiled trtexec and custom plugin from a GTX 1080 machine targeting TX2 NX. After porting all these components onto TX2, trtexec could build an engine, but crashed during inference with an Internal Error (Assertion status == kSTATUS_SUCCESS failed. ). Following a similar procedure, I could successfully deploy my model on GTX 1080 and RTX 2080, but not on Jetson TX2 NX.
Environment
TensorRT Version: 8.2.1 (JetPack 4.6.3)
GPU Type: Jetson TX2 NX
CUDA Version: 10.2
CUDNN Version: 8.0.0
Operating System + Version: Ubuntu 18.04
Python Version: 3.10.8
PyTorch Version: 1.13.1
Torchvision Version: 0.14.1
Relevant Files
I am happy to provide relevant files (e.g., onnx model and TensorRT OSS package, etc.) via DM.
Steps To Reproduce
Context:
My model is a torchvision Faster R-CNN model where I replaced the backbone with ResNet10, and configured the detection head to predict boxes of a single category (plus background). The trained model was first converted into onnx format via torch.onnx.export(). I mainly tested with opset_version==11 (also experimented with other versions, but all led to the same results).
Jetson TX2 is officially compatible up to TensorRT-8.2 (JetPack 4.6.3), which does not natively support RoiAlign in Faster-RCNN. Therefore, I manually added roiAlignPlugin from the official TensorRT OSS release/8.5 and then recompiled relevant .so and trtexec.
Step 1 - Test on GTX 1080
I adapted the codes of roiAlignPlugin and onnx parser from TensorRT OSS 8.5 into my TensorRT OSS 8.2 project (i.e., TRT_OSSPATH). I ran the following commands to build new libnvinfer_plugin.so.8, libnvonnxparser.so.8. and trtexec, etc…
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake … -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=pwd
/out - DCUDA_VERSION=11.8 -DGPU_ARCHS=“61”
make -j$(nproc)
where TRT_LIBPATH corresponds to the path of the downloaded TensorRT-8.2.1.8.Linux.x86_64-gnu. I confirmed that the recompiled trtexec and plugin from Step 1 produced correct detection on my GTX 1080 (and even on another RTX 2080 machine).
Step 2 – Cross-compilation targeting Jetson TX2 NX
To deploy my model on Jetson TX2 NX, I chose cross compilation from GTX 1080 following the Example: Ubuntu 18.04 Cross-Compile for Jetson (aarch64) with cuda-10.2 (JetPack) from GitHub - NVIDIA/TensorRT: NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications..
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake … -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64_jetson.toolchain -DTRT_LIB_DIR=$TRT_LIBPATH/lib -DTRT_OUT_DIR=
pwd
/out -DCUDA_VERSION=10.2 -DCUDNN_LIB=$TX2_CUDA_PATH/lib/libcudnn.so -DCUBLAS_LIB=$TX2_CUDA_PATH/lib/libcublas.so.10 -DCUBLASLT_LIB=$TX2_CUDA_PATH/lib/libcublasLt.so.10 -DCUDA_TOOLKIT_ROOT_DIR=$TX2_CUDA_PATH -DCUDNN_ROOT_DIR=$TX2_CUDNN_PATH -DCUDART_LIB=$TX2_CUDA_PATH/lib/libcudart.so -DCMAKE_CUDA_COMPILER=$TX2_CUDA_PATH/bin/nvcc -DCUDA_INCLUDE_DIRS=$TX2_CUDA_PATH/include -DGPU_ARCHS=“62” -DTRT_PLATFORM_ID=aarch64make -j$(nproc)
I had to add many more input specifications for a successful build. Eventually I was able to build new libnvinfer_plugin.so.8, libnvonnxparser.so.8. and trtexec targeting Jetson TX2.
Step 3 – Test on Jetson TX2 NX.
I copied the above components onto Jetson device and run trtexec –onnx=model.onnx –saveEngine=model.trt. I obtained the following message. An engine was built with success, but trtexec crashed in the inference stage.
[09/06/2022-13:00:11] [I] === Model Options ===
[09/06/2022-13:00:11] [I] Format: ONNX
[09/06/2022-13:00:11] [I] Model: model.onnx
[09/06/2022-13:00:11] [I] Output:
[09/06/2022-13:00:11] [I] === Build Options ===
[09/06/2022-13:00:11] [I] Max batch: explicit batch
[09/06/2022-13:00:11] [I] Workspace: 16 MiB
[09/06/2022-13:00:11] [I] minTiming: 1
[09/06/2022-13:00:11] [I] avgTiming: 8
[09/06/2022-13:00:11] [I] Precision: FP32
[09/06/2022-13:00:11] [I] Calibration:
[09/06/2022-13:00:11] [I] Refit: Disabled
[09/06/2022-13:00:11] [I] Sparsity: Disabled
[09/06/2022-13:00:11] [I] Safe mode: Disabled
[09/06/2022-13:00:11] [I] DirectIO mode: Disabled
[09/06/2022-13:00:11] [I] Restricted mode: Disabled
[09/06/2022-13:00:11] [I] Save engine:
[09/06/2022-13:00:11] [I] Load engine:
[09/06/2022-13:00:11] [I] Profiling verbosity: 0
[09/06/2022-13:00:11] [I] Tactic sources: Using default tactic sources
[09/06/2022-13:00:11] [I] timingCacheMode: local
[09/06/2022-13:00:11] [I] timingCacheFile:
[09/06/2022-13:00:11] [I] Input(s)s format: fp32:CHW
[09/06/2022-13:00:11] [I] Output(s)s format: fp32:CHW
[09/06/2022-13:00:11] [I] Input build shapes: model
[09/06/2022-13:00:11] [I] Input calibration shapes: model
[09/06/2022-13:00:11] [I] === System Options ===
[09/06/2022-13:00:11] [I] Device: 0
[09/06/2022-13:00:11] [I] DLACore:
[09/06/2022-13:00:11] [I] Plugins:
[09/06/2022-13:00:11] [I] === Inference Options ===
[09/06/2022-13:00:11] [I] Batch: Explicit
[09/06/2022-13:00:11] [I] Input inference shapes: model
[09/06/2022-13:00:11] [I] Iterations: 10
[09/06/2022-13:00:11] [I] Duration: 3s (+ 200ms warm up)
[09/06/2022-13:00:11] [I] Sleep time: 0ms
[09/06/2022-13:00:11] [I] Idle time: 0ms
[09/06/2022-13:00:11] [I] Streams: 1
[09/06/2022-13:00:11] [I] ExposeDMA: Disabled
[09/06/2022-13:00:11] [I] Data transfers: Enabled
[09/06/2022-13:00:11] [I] Spin-wait: Disabled
[09/06/2022-13:00:11] [I] Multithreading: Disabled
[09/06/2022-13:00:11] [I] CUDA Graph: Disabled
[09/06/2022-13:00:11] [I] Separate profiling: Disabled
[09/06/2022-13:00:11] [I] Time Deserialize: Disabled
[09/06/2022-13:00:11] [I] Time Refit: Disabled
[09/06/2022-13:00:11] [I] Skip inference: Disabled
[09/06/2022-13:00:11] [I] Inputs:
[09/06/2022-13:00:11] [I] === Reporting Options ===
[09/06/2022-13:00:11] [I] Verbose: Disabled
[09/06/2022-13:00:11] [I] Averages: 10 inferences
[09/06/2022-13:00:11] [I] Percentile: 99
[09/06/2022-13:00:11] [I] Dump refittable layers:Disabled
[09/06/2022-13:00:11] [I] Dump output: Disabled
[09/06/2022-13:00:11] [I] Profile: Disabled
[09/06/2022-13:00:11] [I] Export timing to JSON file:
[09/06/2022-13:00:11] [I] Export output to JSON file:
[09/06/2022-13:00:11] [I] Export profile to JSON file:
[09/06/2022-13:00:11] [I]
[09/06/2022-13:00:11] [I] === Device Information ===
[09/06/2022-13:00:11] [I] Selected Device: NVIDIA Tegra X2
[09/06/2022-13:00:11] [I] Compute Capability: 6.2
[09/06/2022-13:00:11] [I] SMs: 2
[09/06/2022-13:00:11] [I] Compute Clock Rate: 1.3 GHz
[09/06/2022-13:00:11] [I] Device Global Memory: 3825 MiB
[09/06/2022-13:00:11] [I] Shared Memory per SM: 64 KiB
[09/06/2022-13:00:11] [I] Memory Bus Width: 128 bits (ECC disabled)
[09/06/2022-13:00:11] [I] Memory Clock Rate: 1.3 GHz
[09/06/2022-13:00:11] [I]
[09/06/2022-13:00:11] [I] TensorRT version: 8.2.5
[09/06/2022-13:00:13] [I] [TRT] [MemUsageChange] Init CUDA: CPU +267, GPU +0, now: CPU 285, GPU 1692 (MiB)
[09/06/2022-13:00:13] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 285 MiB, GPU 1720 MiB
[09/06/2022-13:00:13] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 314 MiB, GPU 1749 MiB
[09/06/2022-13:00:13] [I] Start parsing network model
[09/06/2022-13:00:14] [I] [TRT] ----------------------------------------------------------------
[09/06/2022-13:00:14] [I] [TRT] Input filename: model.onnx
[09/06/2022-13:00:14] [I] [TRT] ONNX IR version: 0.0.8
[09/06/2022-13:00:14] [I] [TRT] Opset version: 11
[09/06/2022-13:00:14] [I] [TRT] Producer name: pytorch
[09/06/2022-13:00:14] [I] [TRT] Producer version: 1.13.1
[09/06/2022-13:00:14] [I] [TRT] Domain:
[09/06/2022-13:00:14] [I] [TRT] Model version: 0
[09/06/2022-13:00:14] [I] [TRT] Doc string:
[09/06/2022-13:00:14] [I] [TRT] ----------------------------------------------------------------
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:370: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:396: One or more weights outside the range of INT32 was clamped
[09/06/2022-13:00:14] [I] Finish parsing network model
[09/06/2022-13:00:14] [I] [TRT] ---------- Layers Running on DLA ----------
[09/06/2022-13:00:14] [I] [TRT] ---------- Layers Running on GPU ----------
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Constant_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Constant_1_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Constant_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Gather_17_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Constant_24_output_0 + (Unnamed Layer* 68) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Constant_29_output_0 + (Unnamed Layer* 71) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Constant_34_output_0 + (Unnamed Layer* 74) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Constant_39_output_0 + (Unnamed Layer* 77) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 80) [Constant] + (Unnamed Layer* 82) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Constant_40_output_0 + (Unnamed Layer* 83) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 86) [Constant] + (Unnamed Layer* 88) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Constant_41_output_0 + (Unnamed Layer* 89) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_11_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_13_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_12_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_14_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Constant_output_0_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_11_output_0_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_13_output_0_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Gather_21_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Constant_42_output_0 + (Unnamed Layer* 113) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Constant_43_output_0 + (Unnamed Layer* 116) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Gather_17_output_0_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] onnx::Max_553 + (Unnamed Layer* 149) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] onnx::Max_553_4 + (Unnamed Layer* 152) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Cast_6_output_0 + (Unnamed Layer* 155) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Cast_7_output_0 + (Unnamed Layer* 158) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] onnx::Add_575
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] onnx::Gather_586
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 186) [Constant] + (Unnamed Layer* 187) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Constant_2_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Constant_output_0_6
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Constant_6_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Constant_1_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Constant_3_output_0 + (Unnamed Layer* 216) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Constant_4_output_0 + (Unnamed Layer* 219) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_roi_pool/Constant_4_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_roi_pool/Constant_5_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] roi_heads.box_head.fc6.weight
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] roi_heads.box_head.fc6.bias + (Unnamed Layer* 238) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] roi_heads.box_head.fc7.weight
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] roi_heads.box_head.fc7.bias + (Unnamed Layer* 244) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] roi_heads.box_predictor.cls_score.weight
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] roi_heads.box_predictor.cls_score.bias + (Unnamed Layer* 250) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] roi_heads.box_predictor.bbox_pred.weight
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] roi_heads.box_predictor.bbox_pred.bias + (Unnamed Layer* 255) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Constant_9_output_0 + (Unnamed Layer* 280) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Constant_14_output_0 + (Unnamed Layer* 283) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Constant_19_output_0 + (Unnamed Layer* 286) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Constant_24_output_0 + (Unnamed Layer* 289) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 293) [Constant] + (Unnamed Layer* 295) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Constant_25_output_0 + (Unnamed Layer* 296) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 299) [Constant] + (Unnamed Layer* 301) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Constant_26_output_0 + (Unnamed Layer* 302) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Reshape_3_output_0
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Constant_27_output_0 + (Unnamed Layer* 327) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Constant_28_output_0 + (Unnamed Layer* 330) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] onnx::Max_553_7 + (Unnamed Layer* 362) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] onnx::Max_553_8 + (Unnamed Layer* 365) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Cast_6_output_0_9 + (Unnamed Layer* 370) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Cast_7_output_0_10 + (Unnamed Layer* 373) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Squeeze
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Sub
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Div
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Unsqueeze
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Resize
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Gather
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Pad
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Unsqueeze_12
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /transform/Unsqueeze_12_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.0/conv/conv/Conv + /backbone/backbone.0/conv/activ/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.0/pool/MaxPool
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.1/unit1/body/conv1/conv/Conv + /backbone/backbone.1/unit1/body/conv1/activ/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.1/unit1/body/conv2/conv/Conv + /backbone/backbone.1/unit1/Add + /backbone/backbone.1/unit1/activ/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.2/unit1/body/conv1/conv/Conv + /backbone/backbone.2/unit1/body/conv1/activ/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.2/unit1/body/conv2/conv/Conv
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.2/unit1/identity_conv/conv/Conv + /backbone/backbone.2/unit1/Add + /backbone/backbone.2/unit1/activ/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.3/unit1/body/conv1/conv/Conv + /backbone/backbone.3/unit1/body/conv1/activ/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.3/unit1/body/conv2/conv/Conv
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.3/unit1/identity_conv/conv/Conv + /backbone/backbone.3/unit1/Add + /backbone/backbone.3/unit1/activ/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.4/unit1/body/conv1/conv/Conv + /backbone/backbone.4/unit1/body/conv1/activ/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.4/unit1/body/conv2/conv/Conv
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /backbone/backbone.4/unit1/identity_conv/conv/Conv + /backbone/backbone.4/unit1/Add + /backbone/backbone.4/unit1/activ/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/head/conv/conv.0/conv.0.0/Conv + /rpn/head/conv/conv.0/conv.0.1/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/head/bbox_pred/Conv || /rpn/head/cls_logits/Conv
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Reshape + /rpn/Transpose
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Reshape_2 + /rpn/Transpose_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Reshape_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Reshape_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Reshape_1_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Reshape_3_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Reshape_4 + /rpn/Reshape_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Flatten + /rpn/Reshape_8
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Gather_19
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Slice
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Slice_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Slice_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Slice_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/TopK
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Div_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Div_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Div_4
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Div_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Mul_4
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Mul_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 84) [ElementWise]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 90) [ElementWise]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Gather_18
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Add_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Add_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 85) [ElementWise]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 91) [ElementWise]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Gather_20
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Gather_22
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Exp
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Exp_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_25
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Mul_6
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Mul_7
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] PWN(/rpn/Sigmoid)
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Mul_9
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Mul_8
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Sub_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Add_4
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Sub_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Add_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Cast_464
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_15
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_17
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_16
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_18
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_15_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_16_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_17_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_18_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Squeeze_1 + Unsqueeze_471 + Unsqueeze_472 + NonMaxSuppression_475
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Flatten_1 + /rpn/Reshape_6 + /rpn/Reshape_7
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Gather_23
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Gather_24
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Slice_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Slice_6
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Max
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Max_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Min
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Min_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_28
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_29
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_28_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Unsqueeze_29_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Reshape_11
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] ReduceMax_463
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Add_466
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 167) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Mul_467
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Unsqueeze_468
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Add_469
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Unsqueeze_470
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] NonMaxSuppression_475_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Gather_477
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Squeeze_478
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /rpn/Gather_27
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Cast
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Gather_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Gather_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Gather_4
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Gather_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_roi_pool/ConstantOfShape
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Sub
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Sub_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Mul
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Mul_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_roi_pool/Concat_1_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_roi_pool/Concat_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Add
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Add_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_roi_pool/Gather
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_roi_pool/Gather_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_4
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_roi_pool/Squeeze
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_roi_pool/Cast
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_roi_pool/RoiAlign
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_head/Flatten
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_head/fc6/Gemm
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 239) [ElementWise] + /roi_heads/box_head/Relu
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_head/fc7/Gemm
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 245) [ElementWise] + /roi_heads/box_head/Relu_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_predictor/cls_score/Gemm
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/box_predictor/bbox_pred/Gemm
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 251) [ElementWise]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 256) [ElementWise]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Softmax
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Slice
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Slice_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Slice_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Slice_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Div
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Div_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Div_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Div_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Mul_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Mul_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 297) [ElementWise]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 303) [ElementWise]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Add_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Add_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 298) [ElementWise]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 304) [ElementWise]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Reshape_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Exp
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Exp_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Mul_4
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Mul_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Mul_7
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Mul_6
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Expand
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Sub_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Add_4
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Sub_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Add_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_5
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_7
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_6
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_8
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Reshape_6
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_5_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_6_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_7_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_8_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Cast_670
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Slice_6
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Slice_7
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Max
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Max_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Min
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Min_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_12
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_13
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_12_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Unsqueeze_13_output_0 copy
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Reshape_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Slice_8
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Reshape_4
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] ReduceMax_669
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] onnx::Add_785
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Add_672
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] (Unnamed Layer* 388) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Mul_673
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Unsqueeze_674
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Add_675
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Unsqueeze_676
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Unsqueeze_677 + Unsqueeze_678 + NonMaxSuppression_681
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] NonMaxSuppression_681_11
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] onnx::Gather_796
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Gather_683
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] Squeeze_684
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Gather_9
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Gather_10
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /roi_heads/Gather_11
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Squeeze_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Squeeze_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Squeeze_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Squeeze_4
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Div_1_output_0 + (Unnamed Layer* 411) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Mul
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Div_1_output_0_17 + (Unnamed Layer* 414) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Mul_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Div_output_0 + (Unnamed Layer* 417) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Mul_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Div_output_0_18 + (Unnamed Layer* 420) [Shuffle]
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Mul_3
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Unsqueeze
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Unsqueeze_1
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Unsqueeze_2
[09/06/2022-13:00:14] [I] [TRT] [GpuLayer] /Unsqueeze_3
[09/06/2022-13:00:15] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +162, now: CPU 613, GPU 2188 (MiB)
[09/06/2022-13:00:17] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +250, GPU +286, now: CPU 863, GPU 2474 (MiB)
[09/06/2022-13:00:17] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[09/06/2022-13:00:57] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[09/06/2022-13:01:52] [I] [TRT] Detected 1 inputs and 7 output network tensors.
[09/06/2022-13:01:52] [I] [TRT] Total Host Persistent Memory: 26752
[09/06/2022-13:01:52] [I] [TRT] Total Device Persistent Memory: 36447744
[09/06/2022-13:01:52] [I] [TRT] Total Scratch Memory: 512000
[09/06/2022-13:01:52] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 384 MiB
[09/06/2022-13:01:52] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 205.304ms to assign 15 blocks to 199 nodes requiring 83747843 bytes.
[09/06/2022-13:01:52] [I] [TRT] Total Activation Memory: 83747843
[09/06/2022-13:01:52] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1115, GPU 2955 (MiB)
[09/06/2022-13:01:52] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 1116, GPU 2955 (MiB)
[09/06/2022-13:01:52] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +256, now: CPU 0, GPU 256 (MiB)
[09/06/2022-13:01:52] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1251, GPU 3092 (MiB)
[09/06/2022-13:01:52] [I] [TRT] Loaded engine size: 137 MiB
[09/06/2022-13:01:52] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1252, GPU 3094 (MiB)
[09/06/2022-13:01:52] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1252, GPU 3094 (MiB)
[09/06/2022-13:01:52] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +136, now: CPU 0, GPU 136 (MiB)
[09/06/2022-13:01:52] [I] Engine built in 101.427 sec.
[09/06/2022-13:01:52] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 954, GPU 2841 (MiB)
[09/06/2022-13:01:52] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 954, GPU 2841 (MiB)
[09/06/2022-13:01:52] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +115, now: CPU 0, GPU 251 (MiB)
[09/06/2022-13:01:52] [I] Using random values for input input0
[09/06/2022-13:01:52] [I] Created input binding for input0 with dimensions 1x3x480x640
[09/06/2022-13:01:52] [I] Using random values for output scores
[09/06/2022-13:01:52] [I] Created output binding for scores with dimensions 100
[09/06/2022-13:01:52] [I] Using random values for output labels
[09/06/2022-13:01:52] [I] Created output binding for labels with dimensions 100
[09/06/2022-13:01:52] [I] Using random values for output boxes
[09/06/2022-13:01:52] [I] Created output binding for boxes with dimensions 100x4
[09/06/2022-13:01:52] [I] Starting inference
[09/06/2022-13:01:52] [E] Error[2]: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed. )
[09/06/2022-13:01:52] [E] Error occurred during inference
I also observed that even though Faster-RCNN has three outputs (scores, labels, and boxes as shown toward the end of the above log), trtexec appeared to wrongly detect 7 outputs. I confirmed that when running the same command on my GTX 1080, the engine correctly detected 3 outputs.
[09/06/2022-13:01:52] [I] [TRT] Detected 1 inputs and 7 output network tensors.
I repeated Step 1 on multiple versions of TensorRT OSS, and all worked on GTX 1080 and 2080. I wonder if this could be a bug from JetPack 4.6.3? Could you look into the issue and let us know? I am happy to provide you all relevant files (onnx, TensorRT OSS package, etc) via DM. Many thanks in advance!