Nvinferserver: doubts about programmatically generating TensorRT engines from ONNX

nvinfer automatically converts the original model’s format to a TensorRT engine.
To achieve the same with nvinferserver (Triton server) our application needs to take care of building the TensorRT engine using tao-converter, trtexec or some other external binary depending on the original format.

I have a couple of questions in order to implement the TensorRT conversion for ONNX models:

  • nvinfer config for ONNX models doesn’t requires we pass the input layer name. But that info seems to be required when using trtexec. Can the input layer name be omitted in the trtexec call, or there’s some way we can obtain this information programmatically?

  • If the conversion to a TensorRT engine is performed with fixed batch size:

trtexec --buildOnly --optShapes=input:8x3x512x896 --onnx=/var/lib/models/triton_model_repo/a82e9df2-4eb1-454a-93c2-8b8fa113b840/1/a82e9df2-4eb1-454a-93c2-8b8fa113b840.onnx --fp16 --saveEngine=/var/lib/models/triton_model_repo/a82e9df2-4eb1-454a-93c2-8b8fa113b840/1/a82e9df2-4eb1-454a-93c2-8b8fa113b840.engine

and I deploy a single pipeline with 1 input stream I get this error:

2023-10-11T03:58:48.396812Z  INFO triton_server: E1011 03:58:48.396782 176 tensorrt.cc:2130] error setting the binding dimension
ERROR: infer_grpc_client.cpp:427 inference failed with error: request specifies invalid shape for input 'input' for a82e9df2-4eb1-454a-93c2-8b8fa113b840_0. Error details: model expected the shape of dimension 0 to be between 8 and 8 but received 1

ERROR: infer_trtis_backend.cpp:372 failed to specify dims after running inference failed on model:a82e9df2-4eb1-454a-93c2-8b8fa113b840, nvinfer error:NVDSINFER_TRITON_ERROR
2023-10-11T03:58:48.396902Z  INFO run{deployment_id=40a75648-1573-4b15-aff3-ef1250686ad6}:run_pipeline_inner: gst_runner::gstreamer_log: nvinferserver[UID 1]: Error in specifyBackendDims() <infer_grpc_context.cpp:164> [UID = 1]: failed to specify input dims triton backend for model:a82e9df2-4eb1-454a-93c2-8b8fa113b840, nvinfer error:NVDSINFER_TRITON_ERROR gst_level=ERROR   category=nvinferserver object=model_inference1

But when the TRT engine is created with dynamic batch by specifying different min, opt and max values, it will work for single pipeline, our multiple running in parallel:

--minShapes=input:1x{input_shape}
--optShapes=input:4x{input_shape}
--maxShapes=input:8x{input_shape}

Is it correct to assume the ONNX models that have original fixed batch size of 1, will suffer inference bottlenecks in comparison to models that support dynamic batch when Triton is serving multiple pipelines in parallel?
I’m referring when we have to use this shape combination:

--minShapes=input:1x{input_shape}
--optShapes=input:1x{input_shape}
--maxShapes=input:1x{input_shape}

bump!

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Could you please provide your ONNX model? Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.