Generate engine file for different input dimension (Yolov3)

I have a .etlt file of a Yolov3 model trained on images of size (HxW): 704X960, however when I try to create an .engine file from it via deepstream model config file and change the inference input dimension to something else (e.g., 1376x1920), I got dimension mismatched error. The relevant deepsteam property is: infer-dims, I want to double check if infer-dims needs to match training dimensions? It is not clear from the document of both TLT and Deepstream if the infer dims need to match training dims.

My understanding reading the doc of TLT is inference dimension must match training dimension, but may be I’m wrong?

from the doc of tlt-convert:

-d <input_dimensions>
Comma-separated list of input dimensions that should match the dimensions used for tlt-export.

from the doc of tlt-export:

tlt-export [-h] {classification, detectnet_v2, ssd, dssd, faster_rcnn, yolo, retinanet}
                -m <path to the .tlt model file generated by tlt train>
                -k <key>
                [-o <path to output file>]
                [--cal_data_file <path to tensor file>]
                [--cal_image_dir <path to the directory images to calibrate the model]
                [--cal_cache_file <path to output calibration file>]
                [--data_type <Data type for the TensorRT backend during export>]
                [--batches <Number of batches to calibrate over>]
                [--max_batch_size <maximum trt batch size>]
                [--max_workspace_size <maximum workspace size]
                [--batch_size <batch size to TensorRT engine>]
                [--experiment_spec <path to experiment spec file>]
                [--engine_file <path to the TensorRT engine file>]
                [--verbose Verbosity of the logger]    
                [--force_ptq Flag to force PTQ]

I see no explicit option to specify the input dimension which means it’s probably inferred from from training config or the input layer?

Since the input-dims for the engine conversion step (.etlt → .engine) must match input-dims for the model export step (.tlt → .etlt), and the input-dims during export is the same as during training, I think we can’t change the input-dims to something else for the engine conversion?

In deepstream config file, there is

input-dims=c;h;w;0 # where c = number of channels, h = height of the model input, w = width of model input, 0: implies CHW format.

or

uff-input-dims=<c;h;w;0> Where c = number of channels, h = height of the model input, w = width of model input, 0: implies CHW format

The width and height should match the training dimensions.

1 Like

Hi @Morganh,

Thank you for the clarification. I have 2 follow-up questions:

I’m aware that in deepstream, there are filters that take care of scaling the input so the pipeline itself can take inputs of any size. If I want to change input-dims of the model itself when creating the engine (say smaller dims to boost the speed further if pruning + int8 are still not fast enough), it’s not allowed and the only way is to go back and re-train the model with train images of the new size, is that correct? If so, does such constraint applies whether I use deepstream or tlt-convert to generate the engine file?

Yes, if you train a 960x544 model, need to set the same in tlt-convert and deepstream config file.
But when you run inference with deepstream, the 960x544 model can run inference on any resolution of h264 file.