I’m doing the inference with yolov3 tensorrt engine converted by tlt-converter, however I found that the inference result from tensorrt engine and that form tlt-infer are different. I think that might due to differences during pre-processing stage. Since I could not get access to the pre-processing part of tlt-infer, I’ve attached below that part for my tensorrt engine:
For example, here are two results of an input image randomly choose in VOC dataset. Above is the result from my trt engine, the lower one from tlt-infer.
Ok, so to be clear, we only need those two transformations (without other processing like padding, mean extraction etc) to do the inference with trt engine?
More, suggest you to run deepsteam inference to check the result firstly.
Make sure deepstream can run your trt engine correctly comparing to tlt-infer.
By the way, to be precise in implementation, the term “do not change aspect-ratio” you mentioned in step 1 imply the padding operation, and could you specify which function or padding pattern is needed, please?
// normalization scale is 1
net-scale-factor=1.0
// mean extraction in BGR order
offsets=103.939;116.779;123.68
// 1 refers to channels in BGR order
model-color-format=1
Here is the code I used for padding, however implementing this padding makes the result even worse. Is that the right way to do? Or, perhaps could you provide an example of code for padding, please?
Please refer to my steps mentioned and commented above. I think I already describe clearly.
Actually you can use Deepstream to run inference. It is default example for deploying etlt models or trt engine.
Right, if you agree with my padding process, I think I’ve already implement all those steps you mentioned, still, I could not get the consistent result with tlt.
As for the deepstream validation, I’m afraid it’s not an option for me as I’m working on images validation rather than video stream.
./deepstream-custom -c pgie_config_file -i <H264 or JPEG filename> [-b BATCH] [-d]
-h: print help info
-c: pgie config file, e.g. pgie_frcnn_tlt_config.txt
-i: H264 or JPEG input file
-b: batch size, this will override the value of "baitch-size" in pgie config file
-d: enable display, otherwise dump to output H264 or JPEG file
Hi Morganh,
After implementing the preprocessing steps you mentioned before, bounding boxes and scores from trt engine are getting closer to the results of tlt-infer, with some minor differences.
However, I got a lot of wrong label assignments, and I observe that they are exactly assigned to the previous label of the correct one. Here are my list of voc categories and some visualizations: classes_voc = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", \ "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
Hi Morganh,
Now I’m trying to do the similar expriment with RetinaNet, to see if I could get consistent results. Therefore, I would like to ask the preprocessing steps for RetinaNet please?