In DeepStream,use onnx model

When batch size is 1 everything is OK ;
When batch size is 2,logs is :

NFO: [TRT]: Detected 1 inputs and 12 output network tensors.
0:06:32.594657194 2361 0x55866b2af0 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 10001]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1752> [UID = 10001]: serialize cuda engine to file: /home/nvidia/model/gelanbo/gelanbo.onnx_b2_gpu0_fp16.engine successfully
ERROR: [TRT]: pred_lbbox/reshape: reshaping failed for tensor: conv_lbbox/BiasAdd__144:0
ERROR: [TRT]: shapeMachine.cpp (160) - Shape Error in executeReshape: reshape would change volume
ERROR: [TRT]: Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{2 16 16 30} {1 16 16 3 10}
ERROR: [TRT]: pred_lbbox/reshape: reshaping failed for tensor: conv_lbbox/BiasAdd__144:0
ERROR: [TRT]: shapeMachine.cpp (160) - Shape Error in executeReshape: reshape would change volume
ERROR: [TRT]: Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{2 16 16 30} {1 16 16 3 10}
ERROR: [TRT]: pred_lbbox/reshape: reshaping failed for tensor: conv_lbbox/BiasAdd__144:0
ERROR: [TRT]: shapeMachine.cpp (160) - Shape Error in executeReshape: reshape would change volume
ERROR: [TRT]: Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{2 16 16 30} {1 16 16 3 10}
INFO: [FullDims Engine Info]: layers num: 4
0 INPUT kFLOAT Placeholder/inputs_x:0 3x512x512 min: 1x3x512x512 opt: 2x3x512x512 Max: 2x3x512x512
1 OUTPUT kFLOAT pred_lbbox/decode:0 16x16x3x10 min: 0 opt: 0 Max: 0
2 OUTPUT kFLOAT pred_mbbox/decode:0 32x32x3x10 min: 0 opt: 0 Max: 0
3 OUTPUT kFLOAT pred_sbbox/decode:0 64x64x3x10 min: 0 opt: 0 Max: 0

0:06:32.679772443 2361 0x55866b2af0 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger: NvDsInferContext[UID 10001]: Error in NvDsInferContextImpl::allocateBuffers() <nvdsinfer_context_impl.cpp:1323> [UID = 10001]: Failed to allocate cuda output buffer during context initialization
0:06:32.679981321 2361 0x55866b2af0 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger: NvDsInferContext[UID 10001]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1173> [UID = 10001]: Failed to allocate buffers
0:06:32.757130607 2361 0x55866b2af0 WARN nvinfer gstnvinfer.cpp:809:gst_nvinfer_start: error: Failed to create NvDsInferContext instance
0:06:32.757224821 2361 0x55866b2af0 WARN nvinfer gstnvinfer.cpp:809:gst_nvinfer_start: error: Config file path: /opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/galb/pgie_config.txt, NvDsInfer Error: NVDSINFER_CUDA_ERROR

How to fix it

• Hardware Platform (Jetson / GPU) xavier
• DeepStream Version5.0
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)

Hi,

How do you generate the onnx model.
Is your model support batchsize==2?

Thanks.

Input filename: /home/nvidia/model/xxxx.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: tf2onnx
Producer version: 1.6.1
Domain:
Model version: 0
Doc string:

Hi,

Please check your model through Netron:
https://lutzroeder.github.io/netron/

If the batchsize is 1 (ex. gpu_0/data_0: 1x3x224x224), please generate an ONNX model supports batchsize=2 or dynamic input first.

Thanks.

Hi, I generate an ONNX model supports batchsize=2,and the problem is solved.

Another question, tensorrt’s batchsize is 1, when onnx batchsize is 1 or 2, is there any difference in memory, video memory, and inference efficiency?

Hi,

Memory usage should be decided by TensorRT batchsize only.

TensorRT will choose different algorithm for each operation based on the performance.
If the chosen algorithm needs some pre-allocated memory, then it’s expected that batch-size=2 will use more memory.

For performance, to run batchsize=1 inference on a max-batch=1 or max-batch-2 engine should be similar.

Thanks.