Huge speed difference between engines built from scratch and engines built from onnx

frederikschoeller · June 26, 2021, 8:02am

Description

I have a yolov5 model which I would like to deploy.
I found that if I convert my model from onnx to TensorRT, trtexec indicates an inference speed of 25 fps.
But if I build the model layer for layer using INetworkDefinition, the inference speed triples.
How come the TensorRT model is so much faster when explicitly building the model instead of converting from onnx?
Both cases use int8 quantization.

Thanks!

Environment

TensorRT Version: 7.1.3
GPU Type: Jetson Xavier AGX
CUDA Version: 10.2.89
CUDNN Version: 8.0
Operating System + Version: Jetpack 4.5.1

spolisetty · June 28, 2021, 12:10pm

Hi @frederikschoeller,

It depends, sometimes ONNX parser could introduce some additional ops, which may affect the inference speed.

Thank you.

spolisetty · September 20, 2021, 3:01pm

Hi @frederikschoeller,

We are working on this issue. Could you please share us issue repro script of manually defining the network.

Thank you.

spolisetty · October 22, 2021, 9:00am

Hi @frederikschoeller,

When you get a chance could you please share us issue repro as requested above to work on this issue.

Thank you.

frederikschoeller · October 22, 2021, 9:24am

Hi!

does this suffice?

github.com

wang-xinyu/tensorrtx/blob/master/yolov5/yolov5.cpp#L127


      
          
          
    // Release host memory
              for (auto& mem : weightMap)
              {
                  free((void*)(mem.second.values));
              }
          
          
    return engine;
          }
          
          
ICudaEngine* build_engine_p6(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
              INetworkDefinition* network = builder->createNetworkV2(0U);
              // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
              ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
              assert(data);
              
              std::map<std::string, Weights> weightMap = loadWeights(wts_name);
          
          
    /* ------ yolov5 backbone------ */
              auto conv0 = convBlock(network, weightMap, *data,  get_width(64, gw), 6, 2, 1,  "model.0");
              auto conv1 = convBlock(network, weightMap, *conv0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");

spolisetty · October 27, 2021, 4:22am

Hi @frederikschoeller,

We have checked both log for the first conv, both used trt_volta_int8x4_icudnn_int8x4_128x32_relu_small_c32_nn_v1 from the profiler. But their time is different 729.47us vs. 1.4192ms, It makes feel like they are the same model but with different problem size. Then we checked the attached onnx yolov5s6.onnx, the onnx input size is [1,3,1280,1280], but from the tensorrtx/yololayer.h at master · wang-xinyu/tensorrtx · GitHub, the code build from scratch using input size [1, 3, 640, 640] .

Could you please check if you are comparing using the same problem size?

Thank you.

frederikschoeller · October 27, 2021, 9:27pm

I checked, and the model build from scratch indeed uses input size [1,3,1280,1280]

mfoglio · October 27, 2021, 10:16pm

Hi @frederikschoeller , when you said building from scratch using the INetworkDefinition do you mean that you build it using the C++ code?

spolisetty · October 28, 2021, 9:07am

Hi @frederikschoeller,

Could you please provide the verbose log when building the engine by setting Severity::kVERBOSE in the code.

Thank you.

spolisetty · January 7, 2022, 5:42am

Hi @frederikschoeller,

Could you please share these details(issue repro). Which will be helpful to fix this issue.

Thank you.

Topic		Replies	Views
Huge speed difference between engines built from scratch and engines built from onnx Jetson AGX Xavier tensorrt , nvbugs	11	1126	August 3, 2021
Building a engine takes too long TensorRT	13	3881	December 8, 2022
TensorRT Inference Slower When Loading Searalized Engine than Building on the Fly TensorRT jetson-inference	7	1442	October 12, 2021
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1337	January 19, 2022
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1195	December 13, 2022
TensorRT can not accelarate the onnx model for inferencing TensorRT tensorrt , cuda	3	760	April 17, 2020
Onnx -> TensorRT. No speed difference between models of different sizes Jetson AGX Xavier tensorrt , onnx	6	941	September 19, 2021
ONNX model and TensorRT engine works differently TensorRT	5	867	February 20, 2023
Model inferenced with tensorrt is slower than regular pytorch TensorRT cudnn	1	587	February 16, 2024
ONNX engine initialisation/build takes significantly longer in TensorRT 8.5 vs 8.0 TensorRT tensorrt , performance , benchmarks	10	1683	August 20, 2024

Huge speed difference between engines built from scratch and engines built from onnx

Description

Environment

Related topics