Some PyTorch model with slicing operation fails on inference

ji5489 · January 7, 2022, 4:55am

Description

I had to convert rexnet pytorch model into TensorRT to use it with Deepstream 6.0 on Jetson Nano, but I ended with errors below:

# deepstream-app -t -c config.txt

... (after object appears and secondary-gie starts to infer)

ERROR: [TRT]: [shapeMachine.cpp::execute::565] Error Code 7: Internal Error (Slice_28: ISliceLayer has out of bounds access on axis 0
out of bounds access for slice
Instruction: CHECK_SLICE 2 0 16 1
)
ERROR: [TRT]: [executionContext.cpp::enqueueInternal::360] Error Code 2: Internal Error (Could not resolve slots: )
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:07:31.852496379   144   0x55a00d40f0 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<secondary_gie_0> error: Failed to queue input batch for inferencing
ERROR from secondary_gie_0: Failed to queue input batch for inferencing
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1324): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:secondary_gie_bin/GstNvInfer:secondary_gie_0
Quitting
[NvMultiObjectTracker] De-initialized

(deepstream-app:144): GLib-GObject-CRITICAL **: 03:22:14.558: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
App run failed

Full log below:

# deepstream-app -t -c config.txt

Opening in BLOCKING MODE
ERROR: Deserialize engine failed because file path: /workspace/rexnet.engine open error
0:00:01.416849455   144   0x55a054d550 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<secondary_gie_0> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1889> [UID = 2]: deserialize engine from file :/workspace/rexnet.engine failed
0:00:01.416999302   144   0x55a054d550 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<secondary_gie_0> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1996> [UID = 2]: deserialize backend context from engine from file :/workspace/rexnet.engine failed, try rebuild
0:00:01.417051178   144   0x55a054d550 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<secondary_gie_0> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1914> [UID = 2]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
WARNING: [TRT]: Detected invalid timing cache, setup a local cache instead
WARNING: [TRT]: Min value of this profile is not valid
0:07:13.644605812   144   0x55a054d550 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<secondary_gie_0> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1947> [UID = 2]: serialize cuda engine to file: /workspace/rexnet.onnx_b16_gpu0_fp16.engine successfully
INFO: [FullDims Engine Info]: layers num: 2
0   INPUT  kFLOAT images          3x224x224       min: 1x3x224x224     opt: 16x3x224x224    Max: 16x3x224x224
1   OUTPUT kFLOAT output          3               min: 0               opt: 0               Max: 0

0:07:13.804415333   144   0x55a054d550 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<secondary_gie_0> [UID 2]: Load new model:/workspace/secondary/config_infer_secondary.txt sucessfully
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvMultiObjectTracker] Initialized
WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:07:14.378722694   144   0x55a054d550 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/workspace/primary.engine
INFO: [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT images          3x768x768
1   OUTPUT kFLOAT output          12096x7

0:07:14.378848166   144   0x55a054d550 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /workspace/primary.engine
0:07:14.390004525   144   0x55a054d550 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/workspace/primary/config_infer_primary.txt sucessfully

Runtime commands:
        h: Print this help
        q: Quit

        p: Pause
        r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.


**PERF:  FPS 0 (Avg)
**PERF:  0.00 (0.00)
** INFO: <bus_callback:194>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:180>: Pipeline running

NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
H264: Profile = 66, Level = 0
NVMEDIA_ENC: bBlitMode is set to TRUE
**PERF:  30.59 (28.88)
**PERF:  29.16 (29.50)
**PERF:  29.11 (29.09)
**PERF:  29.09 (29.32)
**PERF:  29.12 (29.12)
ERROR: [TRT]: [shapeMachine.cpp::execute::565] Error Code 7: Internal Error (Slice_28: ISliceLayer has out of bounds access on axis 0
out of bounds access for slice
Instruction: CHECK_SLICE 2 0 16 1
)
ERROR: [TRT]: [executionContext.cpp::enqueueInternal::360] Error Code 2: Internal Error (Could not resolve slots: )
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:07:31.852496379   144   0x55a00d40f0 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<secondary_gie_0> error: Failed to queue input batch for inferencing
ERROR from secondary_gie_0: Failed to queue input batch for inferencing
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1324): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:secondary_gie_bin/GstNvInfer:secondary_gie_0
Quitting
[NvMultiObjectTracker] De-initialized

(deepstream-app:144): GLib-GObject-CRITICAL **: 03:22:14.558: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
App run failed

but I could mitigate this problem by change forward() on model from:

def forward(self, x):
    out = self.out(x)
    if self.use_shortcut:
        out[:, 0:self.in_channels] += x  # self.in_channels won't change during inference

    return out

to:

def forward(self, x):
    feature = self.out(x)
    if self.use_shortcut:
        fB, fC, fH, fW = list(feature.shape)
        x_ext = torch.concat([x, torch.zeros(fB, fC - self.in_channels, fH, fW)], axis=1)
        feature = feature + x_ext

    return feature

I’ve create separate git repository for rexnet model and created commit #1 and #2, You also can look around there.

Though problem has gone, I wondered if this was a TensorRT-related bug or model code’s bug. Is there anyone facing same issue?

Thank you.

Environment (PC - for converting PyTorch model to ONNX)

TensorRT Version: v8.0.3
GPU Type: NVIDIA RTX 3060 12GB
Nvidia Driver Version: 495.44
CUDA Version: 11.5.50
CUDNN Version: 8.3.0.96
Operating System + Version: Ubuntu Linux 20.04.3 LTS
Python Version (if applicable): 3.8.12
PyTorch Version (if applicable): 1.11.0a0+b6df043
Baremetal or Container (if container which image + tag): Containerized (nvcr.io/nvidia/pytorch:21.11-py3)

Environment (Jetson Nano - for converting ONNX to TensorRT)

TensorRT Version: v8.0.1.6
GPU Type: Jetson Nano (128-core Maxwell)
Nvidia Driver Version: NVIDIA Jetson Jetpack 4.6 (L4T 32.6.1)
CUDA Version: 10.2.300
CUDNN Version: 8.2.1.32
Operating System + Version: NVIDIA L4T 32.6.1 (Ubuntu Linux 18.04.6 LTS)
Baremetal or Container (if container which image + tag): Containerized (nvcr.io/nvidia/deepstream-l4t:6.0-triton)

Relevant Files

I’ve uploaded ONNX file at Google Drive for convenience (model has uninitialized weight): Google Drive Link
- Original exported ONNX file (after Step 2 - onnxsim): rexnetv1_1.0_before_forward_hack_b16_noweight.onnx
- Mitigation-applied exported ONNX file: rexnetv1_1.0_after_forward_hack_b16_noweight.onnx

Deepstream 6.0 configuration file below:

config.txt

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=3

[tiled-display]
enable=1
rows=1
columns=1
width=1920
height=1080
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file:///opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_1080p_h264.mp4
num-sources=1
drop-frame-interval=0
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0
source-id=0
gpu-id=0
qos=0
nvbuf-memory-type=0
overlay-id=1

[sink1]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=4000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=1
nvbuf-memory-type=0
## If set to TRUE, system timestamp will be attached as ntp timestamp
## If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached
# attach-sys-ts-as-ntp=1

[primary-gie]
enable=1
gpu-id=0
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=4
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt

[tracker]
enable=1
# For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively
tracker-width=640
tracker-height=384
ll-lib-file=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_nvmultiobjecttracker.so
# ll-config-file required to set different tracker types
ll-config-file=tracker_configs/config_tracker_IOU.yml
# ll-config-file=tracker_configs/config_tracker_NvDCF_max_perf.yml
# ll-config-file=tracker_configs/config_tracker_NvDCF_perf.yml
# ll-config-file=tracker_configs/config_tracker_NvDCF_accuracy.yml
# ll-config-file=tracker_configs/config_tracker_DeepSORT.yml
gpu-id=0
enable-batch-process=1
enable-past-frame=1
display-tracking-id=1

[secondary-gie0]
enable=1
gpu-id=0
gie-unique-id=2
operate-on-gie-id=1
operate-on-class-ids=0;
config-file=config_infer_secondary.txt

[tests]
file-loop=0

config_infer_secondary.txt

[property]
gpu-id=0
net-scale-factor=0.00390625
onnx-file=rexnet.onnx
model-engine-file=rexnet.engine
batch-size=16
output-blob-names=output
# 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
labelfile-path=labels.txt
force-implicit-batch-dim=0
model-color-format=0
# 1 - primary, 2 - secondary
process-mode=2
is-classifier=1
classifier-async-mode=0
classifier-threshold=0.5
input-object-min-width=32
input-object-min-height=32
operate-on-gie-id=1
operate-on-class-ids=0;
# NvBufSurfTransformInter_Bilinear
scaling-filter=1
scaling-compute-hw=0
parse-classifier-func-name=NvDsInferClassiferParseCustomClassifier
custom-lib-path=nvdsinfer_custom_impl_classifier/libnvdsinfer_custom_impl_classifier.so

Steps To Reproduce

I’ve created TensorRT model with those process:

Convert ReXNet model into ONNX model as below:

import torch
from rexnetv1 import ReXNetV1

model = ReXNetV1().eval()
batch_size = 16
torch.onnx._export(
    rexnet_model,
    torch.randn(batch_size, 3, 224, 224),
    ONNX_OUTPUT_PATH,
    input_names=['images'],
    output_names=['output'],
    dynamic_axes=None,
    opset_version=11
)

Simplify ONNX using daquexian/onnx-simplifier:

from onnxsim import simplify

onnx_model = onnx.load(ONNX_OUTPUT_PATH)
model_simp, check = simplify(onnx_model,
                             dynamic_input_shape=False,
                             input_shapes=None)

Set ONNX_OUTPUT_PATH to Deepstream 6.0 deepstream-app sample configuration file:

NVES · January 7, 2022, 5:08am

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

system · August 21, 2023, 1:22am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Failed to used TensorRT Engine file in deepstream DeepStream SDK	16	2752	October 12, 2021
Unable to parse custom pytorch UNET onnx model with python deepstream-segmentation-app TensorRT	8	1484	September 21, 2022
ONNX to TensorRT Python module doesn't generate dynamic batch size engine TensorRT tensorrt , cudnn , onnx	3	1070	October 20, 2023
Running a pytorch network converted to ONNX with TensorRT on the TX2 Jetson TX2	24	8857	October 18, 2021
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1399	July 12, 2022
DeepStream, Tensorflow Model Zoo - Incompatibility DeepStream SDK	13	1493	October 12, 2021
How to generate a tensorrt model that is supported by Deesptream sdk DeepStream SDK	17	539	January 29, 2024
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	2028	November 29, 2022
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	710	April 30, 2024
Reshaping error when set batch-size greater than 1 in onnx modle DeepStream SDK	23	1238	February 10, 2023