Failed to used TensorRT Engine file in deepstream

giangblackk · February 24, 2020, 7:59am

I tried to use a custom TensorRT Engine file converted from ONN using onnx2trt tool from https://github.com/onnx/onnx-tensorrt for detection.

I faced the error:

Creating LL OSD context new
0:00:04.254747581 27928 0x5637940944f0 WARN                 nvinfer gstnvinfer.cpp:515:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:log(): TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
0:00:04.262776487 27928 0x5637940944f0 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:initialize(): RGB/BGR input format specified but network input channels is not 3
0:00:04.263354770 27928 0x5637940944f0 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<primary_gie_classifier> error: Failed to create NvDsInferContext instance
0:00:04.263365019 27928 0x5637940944f0 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<primary_gie_classifier> error: Config file path: /deepstream-4.0/samples/configs/deepstream-app/config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
** ERROR: <main:651>: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie_classifier: Failed to create NvDsInferContext instance
Debug info: gstnvinfer.cpp(692): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie_classifier:
Config file path: /deepstream-4.0/samples/configs/deepstream-app/config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
App run failed

I checked my ONNX model, the input size is (1, 3, 240, 320), just like an usual detection model. I don’t know what’s wrong with this model that cause the error: “RGB/BGR input format specified but network input channels is not 3”.

Does anybody know how to fix this problem?

AastaLLL · February 25, 2020, 2:31am

Hi,

TensorRT might think you are using a caffemodel, which should have dimension=4 based on your definition.

Do you pass the model with onnx-file parameter?
Could you share the model property config with us for checking?

Thanks.

giangblackk · February 25, 2020, 3:22am

Hi,

Here is my config file:

[property]
gpu-id=0
net-scale-factor=0.0078125
offsets=127.0;127.0;127.0
model-engine-file=/model/Mb_640.engine
labelfile-path=/models/labels.txt
onnx-file=/models/Mb_640.onnx
batch-size=1
network-mode=0
num-detected-classes=2
interval=0
gie-unique-id=1
# is-classifier=0
# maintain-aspect-ratio=1
output-blob-names=boxes;scores

parse-bbox-func-name=NvDsInferParseCustomSSD
custom-lib-path=nvparsebbox.so

[class-attrs-all]
threshold=0.8
eps=0.2
group-threshold=1

Because I have converted the ONNX model to TensorRT engine using onnx2trt tool, I added “model-engine-file” to config.

giangblackk · February 25, 2020, 10:10am

I found out what happened.

The onnx2trt tool converts an ONNX model with 4 Dimension input to a TensorRT engine file with 4 Dimensions input.

But the version 4.0.2 of DeepStream’s nvinfer plugin only accepts the engine file with 3 Dimension input.

I can’t find any line in source code of nvinfer plugin that take 4 Dimension input.

I think this should be fixed in next version of DeepStream.

AastaLLL · February 26, 2020, 2:18am

Hi,

Please noticed that there is also an onnx converter inside the DeepstreamSDK, which is integrated into TensorRT.
Would you mind to remove the TenosrRT engine and the path to use the Deepstream onnx parser for a try?

If it is still not working, would you mind to share your onnx file for us checking?

Thanks.

AastaLLL · March 2, 2020, 8:54am

Hi,

Have you tried to feed ONNX model into Deepstream directly?
Please let us know if anything we can help.

Thanks.

giangblackk · March 2, 2020, 9:44am

Hi,

Here is the error when I tried to feed ONNX model directly into DeepStream:

ERROR: ModelImporter.cpp:462 In function importModel:
[4] Assertion failed: !_importer_ctx.network()->hasImplicitBatchDimension() && "This version of the ONNX parser only supports networks with an explicit batch dimension"
0:00:01.291475621 25155 0x5576f61cf580 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:generateTRTModel(): Failed to parse onnx file
0:00:01.293925269 25155 0x5576f61cf580 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:initialize(): Failed to create engine from model files
0:00:01.294068405 25155 0x5576f61cf580 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<primary_gie_classifier> error: Failed to create NvDsInferContext instance
0:00:01.294081858 25155 0x5576f61cf580 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<primary_gie_classifier> error: Config file path: config.txt, NvDsInfer Error: NVDSINFER_TENSORRT_ERROR

AastaLLL · March 4, 2020, 2:24am

Hi,

Could you share your onnx file with us for checking?

Thanks.

giangblackk · March 5, 2020, 7:23pm

Hi,
Here is the onnx file that I used:
https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB/blob/master/models/onnx/version-RFB-320.onnx

AastaLLL · March 6, 2020, 6:38am

Hi,

We try your model with TensorRT engine directly and found out this error:

WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
WARNING: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Successfully casted down to INT32.
[b]While parsing node number 76 [Gather]:
ERROR: onnx2trt_utils.hpp:347 In function convert_axis:[/b]
[8] Assertion failed: axis >= 0 && axis < nbDims
[02/06/2020-14:25:12] [E] Failed to parse onnx file
[02/06/2020-14:25:12] [E] Parsing model failed
[02/06/2020-14:25:12] [E] Engine could not be created
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=./version-RFB-320.onnx

So this issue occurs from the TensorRT support and more precisely, from the Gather layer operation.

There is some limitation of gather layer:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-developer-guide/index.html#gather-layer
Could you check if is there any gather operation applied to the axis=0 (batchsize axis) which is not supported.

Thanks.

Blard.Theophile · March 13, 2020, 2:23pm

Hi @AastaLLL and @giangblackk.
I also have the same error when loading a custom ONNX file (exported from a Pytorch model):

0:00:15.636872089 13071   0x559fbc0cd0 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]:initialize(): RGB/BGR input format specified but network input channels is not 3
0:00:15.643056850 13071   0x559fbc0cd0 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<nvinfer0> error: Failed to create NvDsInferContext instance
0:00:15.643130486 13071   0x559fbc0cd0 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<nvinfer0> error: Config file path: ./config_infer_custom_yolact.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED

This is not a TensorRT compatibility error.
@AastaLLL, you should try to build the engine with this branch: GitHub - onnx/onnx-tensorrt at 6.0-full-dims.

As for @giangblackk, you should modify the nvdsinfer_context_impl.cpp file, to make it work with this branch.
In NvDsInferContextImpl::generateTRTModel, you should modify the network instantiation with the following code :

const auto explicitBatch = 1U <<      static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
    NvDsInferUniquePtr<IBuilder> builder = nvinfer1::createInferBuilder(m_Logger);
    NvDsInferUniquePtr<INetworkDefinition> network = builder->createNetworkV2(explicitBatch);

Nvinfer is using “createNetwork”, which (I think) is deprecated. It doesn’t specify the “explicitBatch” value, needed by the onnx parser, hence your error.

After doing that, we should all be at the same point:

We can successfully build a TRT Engine on onnx2trt AND Deepstream
This engine runs ok on trtexec
But we have the "RGB/BGR input format specified but network input channels is not 3" error

Blard.Theophile · March 13, 2020, 2:23pm

AastaLLL:

Hi,

We try your model with TensorRT engine directly and found out this error:
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
WARNING: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Successfully casted down to INT32.
[b]While parsing node number 76 [Gather]:
ERROR: onnx2trt_utils.hpp:347 In function convert_axis:[/b]
[8] Assertion failed: axis >= 0 && axis < nbDims
[02/06/2020-14:25:12] [E] Failed to parse onnx file
[02/06/2020-14:25:12] [E] Parsing model failed
[02/06/2020-14:25:12] [E] Engine could not be created
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=./version-RFB-320.onnx
So this issue occurs from the TensorRT support and more precisely, from the Gather layer operation.

There is some limitation of gather layer:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-developer-guide/index.html#gather-layer
Could you check if is there any gather operation applied to the axis=0 (batchsize axis) which is not supported.

Thanks.

I tried to use a custom TensorRT Engine file converted from ONN using onnx2trt tool from https://github.com/onnx/onnx-tensorrt for detection.

I faced the error:

Creating LL OSD context new
0:00:04.254747581 27928 0x5637940944f0 WARN                 nvinfer gstnvinfer.cpp:515:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:log(): TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
0:00:04.262776487 27928 0x5637940944f0 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:initialize(): RGB/BGR input format specified but network input channels is not 3
0:00:04.263354770 27928 0x5637940944f0 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<primary_gie_classifier> error: Failed to create NvDsInferContext instance
0:00:04.263365019 27928 0x5637940944f0 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<primary_gie_classifier> error: Config file path: /deepstream-4.0/samples/configs/deepstream-app/config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
** ERROR: <main:651>: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie_classifier: Failed to create NvDsInferContext instance
Debug info: gstnvinfer.cpp(692): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie_classifier:
Config file path: /deepstream-4.0/samples/configs/deepstream-app/config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
App run failed

I checked my ONNX model, the input size is (1, 3, 240, 320), just like an usual detection model. I don’t know what’s wrong with this model that cause the error: “RGB/BGR input format specified but network input channels is not 3”.

Does anybody know how to fix this problem?

In order to solve this problem, I added some “print” functions to nvdsinfer_context_impl.cpp, where this error occurs :

/* Get the network input dimensions. */
    DimsCHW inputDims =
        static_cast<DimsCHW&&>(m_CudaEngine->getBindingDimensions(INPUT_LAYER_INDEX));
    m_NetworkInfo.width = inputDims.w();
    m_NetworkInfo.height = inputDims.h();
    m_NetworkInfo.channels = inputDims.c();

    std::cout << "width: " << m_NetworkInfo.width << '\n'
              << "height: " << m_NetworkInfo.height << '\n'
              << "channels: " << m_NetworkInfo.channels << '\n';

    switch (m_NetworkInputFormat)
    {
        case NvDsInferFormat_RGB:
        case NvDsInferFormat_BGR:
            if (m_NetworkInfo.channels != 3)
            {
                printError("RGB/BGR input format specified but network input"
                    " channels is not 3");
                return NVDSINFER_CONFIG_FAILED;
            }
            break;
        case NvDsInferFormat_GRAY:
            if (m_NetworkInfo.channels != 1)
            {
                printError("GRAY input format specified but network input "
                    "channels is not 1.");
                return NVDSINFER_CONFIG_FAILED;
            }
            break;
        default:
            printError("Unknown input format");
            return NVDSINFER_CONFIG_FAILED;
    }

In my case, the output is:

width: 550
height: 3
channels: 1

Where it should be

width: 550
height: 550
channels: 3

In other words, inputDims is not initialized, correctly. Unfortunately, I don’t have a lot of time to solve this Deepstream bug.

AastaLLL · March 18, 2020, 9:45am

Hi, Blard.Theophile

We’ve tested giangblackk’s model with onnx-tensorrt 6.0-full-dims branch and still meet the same error:

Parsing model
While parsing node number 76 [Gather -> "321"]:
ERROR: /home/nvidia/topic_112526/onnx-tensorrt/builtin_op_importers.cpp:703 In function importGather:
[8] Assertion failed: !(data->getType() == nvinfer1::DataType::kINT32 && nbDims == 1) && "Cannot perform gather on a shape tensor!"

Based on our experience, this error usually occurs when a model apply gather to the axis 0, which is not supported by TensorRT yet.

For your error, deepstream expects an input dimension number = 3 for the BGR format.
So the input of ONNX should be [3, 240, 320] rather than [1, 3, 240, 320].

Thanks.

Blard.Theophile · March 18, 2020, 1:08pm

Hi @AastaLLL,

Thanks for your answer.

I also noticed afterwards that switching branch would not change the Gather error.
Maybe giangblackk modified the network before exporting as an onnx file.
On my side, I initially also had these Gather errors, but I solved them by changing some pytorch .view operations (it seems to be a common problem).

Because I have access to the network code, I might also be able to change the input shape, by removing the first dimension (which is the batch size).

However, how should we do if we can’t change the network (eg: if only the onnx file is available) ?
ONNX parser seems to handle the (batch_size, n_channels, width, height) inputs (hence the “hasImplicitBatchDimension” error, so Deepstream could also do it ? By any chance, can we override the getBindingDimensions ?

AastaLLL · March 25, 2020, 8:17am

Hi,

The implicit/explicit batch support is added in our next Deepstream release.
Please wait for our announcement for the update.

Thanks.

pinktree3 · July 14, 2020, 4:06am

Hello @AastaLLL @Blard.Theophile @giangblackk,
Can you please help me with my error

any tips/suggestions please
thanks