How does the gst-nvinfer generates the engine file?

Hi,

I wrote a simple code to generate an engine file from caffemodel.

def build_engine(deploy_file, model_file, outputs):
    TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
    datatype = trt.float16
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.CaffeParser() as parser:
        model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=datatype)
        # FP16
        builder.fp16_mode = True
        builder.strict_type_constraints = True
        # output
        for output in outputs:
            network.mark_output(model_tensors.find(output))
        builder.max_batch_size = 1
        builder.max_workspace_size = 1 << 20
        with trt.Builder(TRT_LOGGER) as builder:
            with builder.build_cuda_engine(network) as engine:
                with open('{}_{}.engine'.format(model_file, 'fp16'), 'wb') as f:
                    f.write(engine.serialize())

Here’s the comparison from the model in the Primary_Detector in size.

-rw-r--r-- 1 nvidia nvidia 6244865 Dec 12 08:14 resnet10.caffemodel
-rw-r--r-- 1 nvidia nvidia 8057761 Apr  9 03:01 resnet10.caffemodel_b1_fp16.engine
-rw-r--r-- 1 nvidia nvidia 8063168 Dec 12 08:14 resnet10.caffemodel_b30_fp16.engine
-rw-r--r-- 1 nvidia nvidia 6878089 Apr 10 14:08 resnet10.caffemodel_fp16.engine

resnet10.caffemodel_fp16.engine is the one generated by the script above.

The one with b30 is on a TX2 box, and another with b1 is on a Nano box.

So, how, with which configurations, does the gst-nvinfer generates an engine file? I can see that output blobs names and network mode in the config file, which are covered in the script above.

Note that all the engines work fine. Not visible differences.

Hi,

The backend framework is TensorRT.
Here is overflow of TensorRT for your reference:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#overview

When converting a model into TensorRT engine, it will check the model architecture layer by layer. And choose an optimal implementation based on the device architecture.

1. The engine generated from TX2 or Nano are different.
The engine file is not portable, and the information will vary on different GPU architecture.

2. Batchsize indicates the maximum batch will use.
This may have some effect on some pre-allocated parameter.

Thanks.